It has been 6 months since the last Stockwell update. With new priorities for many months and reducing our efforts on Stockwell, it was overlooked by me to send updates. While we have been spending a reasonable amount of time hacking on Stockwell, it has been a less transparent.
I want to cover where we were a year ago, and where we are today.
1 year ago today I posted on my blog about defining intermittent. We were just starting to focus on learning about failures. We collected data, read bugs, interviewed many influential people across Mozilla and came up with a plan which we presented Stockwell at the Hawaii all hands. Our plan was to do a few things:
- Triage all failures >=30 instances/week
- Build tools to make triage easier and collect more data
- Adjust policy for triaging, disabling, and managing intermittents
- Make our tests better with linting and test-verification
- Invest time into auto-classification
- Define test ownership and triage models that are scalable
While we haven’t focused 100% on intermittent failures in the last 52 weeks, we did about half the time, and have achieved a few things:
- Triaged all failures >= 30 instances/week (most weeks, never more than 3 weeks off)
- Many improvements to our tools, including: adjusteing orange factor robot, intermittent-bug-filer, and added |mach test-info|
- Played with policy on/off, have settled on needinfo “owner” when 30+ failures/week, and disabling if 200 failures in 30 days.
- Added eslint to our tests, pylint for our tools, and the new TV job is tier-2.
- added source file -> bugzilla components in-tree to define ownership.
- 31 bugzilla components triage their own intermittents
While that is a lot of changes, it is incremental yet effective. We started with an Orange Factor of 24+, and often we see <12 (although last week it is closer to 14). While doing that we have added many tests, almost doubling our test load and the Orange Factor has remained low. We still don’t think that is success, we often have 50+ bugs in a state of “needswork”, and it would be more ideal to have <20 in progress at any one time. We are still ignoring half the problem, all the other failures that do not cross our threshold of 30 failures/week.
Some statistics about bugs over the last 9 months (Since January 1st):
As you can see that is a lot of disabled tests. Note, we usually only disable a test on a subset of the configurations, not 100% across the board. Another NOTE: unknown bugs are ones that were failing frequently and for some undocumented reason have reduced in frequency.
One other interesting piece of data is many of the fixed bugs we have tried to associate with a root cause, we have done this for 265 bugs and 90 of them are actual product fixes 🙂 The rest are harness, tooling, infra, or more commonly test case fixes.
I will be doing some followup posts on details of the changes we have made over the year including:
- Triage process for component owners and others who want to participate
- Test verification and the future
- Workflow of an intermittent, from first failure to resolution
- Future of Orange Factor and Autoclassification
- Vision for the future in 6 months
Please note that the 511 bugs that were fixed were done by the many great developers we have at Mozilla. These were often randomized requests in a very busy schedule, so if you are reading this and you fixed an intermittent, thank you!