Tag Archives: bugs

Re-Triggering for a [root] cause – some notes from doing it

With all this talk of intermittent failures and folks coming up with ideas on how to solve them, I figured I should dive head first into looking at failures.  I have been working with a few folks on this, specifically :parkouss and :vaibhav1994.  This experiment (actually the second time doing so) is where I take a given intermittent failure bug and retrigger it.  If it reproduces, then I go back in history looking for where it becomes intermittent.  This weekend I wrote up some notes as I was trying to define what an intermittent is.

Lets outline the parameters first for this experiment:

  • All bugs marked with keyword ‘intermittent-failure’ qualify
  • Bugs must not be resolved or assigned to anybody
  • Bugs must have been filed in the last 28 days (we only keep 30 days of builds)
  • Bugs must have >=20 tbplrobot comments (arbitrarily picked, ideally we want something that can easily reproduce)

Here are what comes out of this:

  • 356 bugs are open, not assigned and have the intermittent-failure keyword
  • 25 bugs have >=20 comments meeting our criteria

The next step was to look at each of the 25 bugs and see if it makes sense to do this.  In fact 13 of the bugs I decided not to take action on (remember this is an experiment, so my reasoning for ignoring these 13 could be biased):

  • 5 bugs were thunderbird/mozmill only
  • 3 bugs looked to be related to android harness issues
  • bug 1157090 hadn’t reproduced in 2 weeks- was APZ feature which we turned off.
  • one bug was only on mozilla-beta only
  • 2 bugs had patches and 1 had a couple of comments indicating a developer was already looking at it

This leaves us with 12 bugs to investigate.  The premise here is easy, find the first occurrence of the intermittent (branch, platform, testjob, revision) and re-trigger it 20 times (picked for simplicity).  When the results are in, see if we have reproduced it.  In fact, only 5 bugs reproduced the exact error in the bug when re-triggered 20 times on a specific job that showed the error.

Moving on, I started re-triggering jobs back in the pushlog to see where it was introduced.  I started off with going back 2/4/6 revisions, but got more aggressive as I didn’t see patterns.  Here is a summary of what the 5 bugs turned out like:

  • Bug 1161915 – Windows XP PGO Reftest.  Found the root cause (pgo only, lots of pgo builds were required for this) 23 revisions back.
  • Bug 1160780 – OSX 10.6 mochitest-e10s-bc1.  Found the root cause 33 revisions back.
  • Bug 1161052 – Jetpack test failures.  So many failures in the same test file, it isn’t very clear if I am reproducing the failure or finding other ones.  :erikvold is working on fixing the test in bug 1163796 and ok’d disabling it if we want to.
  • Bug 1161537 – OSX 10.6 Mochitest-other.  Bisection didn’t find the root cause, but this is a new test case which was added.  This is a case where when the new test case was added it could have been run 100+ times successfully, then when it merged with other branches a couple hours later it failed!
  • bug 1155423 – Linux debug reftest-e10s-1.  This reproduced 75 revisions in the past, and due to that I looked at what changed in our buildbot-configs and mozharness scripts.  We actually turned on this test job (hadn’t been running e10s reftests on debug prior to this) and that caused the problem.  This can’t be tracked down by re-triggering jobs into the past.

In summary, out of 356 bugs 2 root causes were found by re-triggering.  In terms of time invested into this, I have put about 6 hours of time to fine the root cause of the 5 bugs.


Filed under testdev

SETA – Search for Extraneous Test Automation

Here at Mozilla we run dozens of builds and hundreds of test jobs for every push to a tree.  As time has gone on, we have gone from a couple hours from push to all tests complete to 4+ hours.  With the exception of a few test jobs, we could complete our test results in

The question becomes, how do we manage to keep up test coverage without growing the number of machines?  Right now we do this with buildbot coalescing (we queue up the jobs and skip the older ones when the load is high).  While this works great, it causes us to skip a bunch of jobs (builds/tests) on random pushes and sometimes we need to go back in and manually schedule jobs to find failures.  In fact, while keeping up with the automated alerts for talos regressions, the coalescing causes problems in over half of the regressions that I investigate!

Knowing that we live with coalescing and have for years, many of us started wondering if we need all of our tests.  Ideally we could select tests that are statistically most significant to the changes being pushed, and if those pass, we could run the rest of the tests if there were available machines.  To get there is tough, maybe there is a better way to solve this?  Luckily we can mine meta data from treeherder (and the former tbpl) and determine which failures are intermittent and which have been fixed/caused by a different revision.

A few months ago we started looking into unique failures on the trees.  Not just the failures, but which jobs failed.  Normally when we have a failure detected by the automation, many jobs fail at once (for example: xpcshell tests will fail on all windows platforms, opt + debug).  When you look at the common jobs which fail across all the failures over time, you can determine the minimum number of jobs required to detected all the failures.  Keep in mind that we only need 1 job to represent a given failure.

As of today, we have data since August 13, 2014 (just shy of 180 days):

  • 1061 failures caught by automation (for desktop builds/tests)
  • 362 jobs are currently run for all desktop builds
  • 285 jobs are optional and not needed to detect all 1061 regressions

To phrase this another way, we could have run 77 jobs per push and caught every regression in the last 6 months.  Lets back up  a bit and look at the regressions found- how many are there and how often do we see them:

Cumulative and per day regressions

Cumulative and per day regressions

This is a lot of regressions, yay for automation.  The problem is that this is historical data, not future data.  Our tests, browser, and features change every day, this doesn’t seem very useful for predicting the future.  This is a parallel to the stock market, there people invest in companies based on historical data and make decisions based on incoming data (press releases, quarterly earnings).  This is the same concept.  We have dozens of new failures every week, and if we only relied upon the 77 test jobs (which would catch all historical regressions) we would miss new ones.  This is easy to detect, and we have mapped out the changes.  Here it is on a calendar view (bold dates indicate a change was detected, i.e. a new job needed in the reduced set of jobs list):

Bolded dates are when a change is needed due to new failuresThis translates to about 1.5 changes per week.  To put this another way, if we were only running the 77 reduced set of jobs, we would have missed one regression December 2nd, and another December 16th, etc., or on average 1.5 regressions will be missed per week.  In a scenario where we only ran the optional jobs once/hour on the integration branches, 1-2 times/week we would see a failure and have to backfill some jobs (as we currently do for coalesced jobs) for the last hour to find the push which caused the failure.

To put this into perspective, here is a similar view to what you would expect to see today on treeherder:

All desktop unittest jobsFor perspective, here is what it would look like assuming we only ran the reduced set of 77 jobs:

Reduced set of jobs view* keep in mind this is weighted such that we prefer to run jobs on linux* builds since those run in the cloud.

With all of this information, what do we plan to do with it?  We plan to run the reduced set of jobs by default on all pushes, and use the [285] optional jobs as candidates for coalescing.  Currently we force coalescing for debug unittests.  This was done about 6 months ago because debug tests take considerably longer than opt, so if we could run them on every 2nd or 3rd build, we would save a lot of machine time.  This is only being considered on integration trees that the sheriffs monitor (mozilla-inbound, fx-team).

Some questions that are commonly asked:

  • How do you plan to keep this up to date?
    • We run a cronjob every day and update our master list of jobs, failures, and optional jobs.  This takes about 2 minutes.
  • What are the chances the reduced set of jobs catch >1 failure?  Do we need all 77 jobs?
    • 77 jobs detect 1061 failures (100%)
    • 35 jobs detect 977 failures (92%)
    • 23 jobs detect 940 failures (88.6%)
    • 12 jobs detect 900 failures (84.8%)
  • How can we see the data:
    • SETA website
    • in the near future summary emails when we detect a *change* to mozilla.dev.tree-alerts

Thanks for reading so far!  This project wouldn’t be here it it wasn’t for the many hours of work by Vaibhav, he continues to find more ways to contribute to Mozilla. If anything this should inspire you to think more about how our scheduling works and what great things we can do if we think out of the box.


Filed under testdev

Where did all the good first bugs go?

As this is the short time window of Google Summer of Code applications, I have seen a lot of requests for mochitest related bugs to work on.  Normally, we look for new bugs on the bugs ahoy! tool.  Most of these have been picked through, so I spent some time going through a bunch of mochitest/automation related bugs.  Many of the bugs I found were outdated, duplicates of other things, or didn’t apply to the tools today.

Here is my short list of bugs to get more familiar with automation while fixing bugs which solve real problems for us:

  • bug 958897 – ssltunnel lives if mochitest killed
  • Bug 841808 – mozfile.rmtree should handle windows directory in use better
  • Bug 892283 – consider using shutil.rmtree and/or distutils remove_tree for mozfile
  • Bug 908945 – Fix automation.py’s exit code handling
  • Bug 912243 – Mochitest shouldnt chdir in __init__
  • Bug 939755 – With httpd.js we sometimes don’t get the most recent version of the file

I have added the appropriate tags to those bugs to make them good first bugs.  Please take time to look over the bug and ask questions in the bug to get a full understanding of what needs to be done and how to test it.

Happy hacking!

Leave a comment

Filed under Uncategorized