Ok, the title might be misleading, but as of the last few days we are <5% orange for android unittests on mozilla-central. The reason this was done is we have hidden J1 and R2 from the results. We are tracking these in the weekly mobile automation meetings, and will continue to do so until those tests are live again.
For more data on specifics to our test failure distribution, please check out this spreadsheet and look at the different sheets. We have been working for the last few weeks trying to reproduce these failures and the only concrete reproducible bug we could come up with was bug 691073.
We will continue to fix a few oranges that we see on the other tests as well as reduce the number of red/purple.
At Mozilla we have made our unit testing on android devices to be as important as desktop testing. Earlier today I was asked how do we measure this and what is our definition of success. The obvious answer is no failures except for code that breaks a test, but reality is something where we allow for random failures and infrastructure failures. Our current goal is 5%
So what are these acceptable failures and what does 5% really mean. Failures can happen when we have tests which fail randomly, usually poorly written tests or tests which have been written a long time ago and hacked to work in todays environment. This doesn’t mean any test that fails is a problem, it could be a previous test that changes a Firefox preference on accident. For Android testing, this currently means the browser failed to launch and load the test webpage properly or it crashed in the middle of the test. Other failures are the device losing connectivity, our host machine having hiccups, the network going down, sdcard failures, and many other problems. With our current state of testing this mostly falls into the category of losing connectivity to the device. For infrastructure problems they are indicated as Red or Purple and for test related problems they are Orange.
I took at a look at the last 10 runs on mozilla-central (where we build Firefox nightlies from) and built this little graph:
Firefox Android Failures
Here you can see that our tests are causing 6.67% of the failures and 12.33% of the time we can expect a failure on Android.
We have another branch called mozilla-inbound (we merge this into mozilla-central regularly) where most of the latest changes get checked in. I did the same thing here:
mozilla-inbound Android Failures
Here you can see that our tests are causing 7.77% of the failures and 9.89% of the time we can expect a failure on Android.
This is only a small sample of the tests, but it should give you a good idea of where we are.