At Mozilla we have made our unit testing on android devices to be as important as desktop testing. Earlier today I was asked how do we measure this and what is our definition of success. The obvious answer is no failures except for code that breaks a test, but reality is something where we allow for random failures and infrastructure failures. Our current goal is 5%
So what are these acceptable failures and what does 5% really mean. Failures can happen when we have tests which fail randomly, usually poorly written tests or tests which have been written a long time ago and hacked to work in todays environment. This doesn’t mean any test that fails is a problem, it could be a previous test that changes a Firefox preference on accident. For Android testing, this currently means the browser failed to launch and load the test webpage properly or it crashed in the middle of the test. Other failures are the device losing connectivity, our host machine having hiccups, the network going down, sdcard failures, and many other problems. With our current state of testing this mostly falls into the category of losing connectivity to the device. For infrastructure problems they are indicated as Red or Purple and for test related problems they are Orange.
I took at a look at the last 10 runs on mozilla-central (where we build Firefox nightlies from) and built this little graph:
Here you can see that our tests are causing 6.67% of the failures and 12.33% of the time we can expect a failure on Android.
We have another branch called mozilla-inbound (we merge this into mozilla-central regularly) where most of the latest changes get checked in. I did the same thing here:
Here you can see that our tests are causing 7.77% of the failures and 9.89% of the time we can expect a failure on Android.
This is only a small sample of the tests, but it should give you a good idea of where we are.