Last week I wrote a post with some thoughts on AutoLand and Try Server, this had some wonderful comments and because of that I have continued to think in the same problem space a bit more.

In chatting with Vaibhav1994 (who is really doing an awesome GSoC project this summer for Mozilla), we started brainstorming another way to resolve our intermittent orange problem.

What if we rerun the test case that caused the job to go orange (yes in a crash, leak, shutdown timeout we would rerun the entire job) and if it was green then we could deem the failure as intermittent and ignore it!

With some of the work being done in bug 1014125, we could achieve this outside of buildbot and the job could repeat itself inside the single job instance yielding a true green.

One thought- we might want to ensure that if it is a test failing that we run it 5 times and it only fails 1 time, otherwise it is too intermittent.

A second thought- we would do this by try by default for autoland, but still show the intermittents on integration branches.

I will eventually get to my thoughts on massive chunking, but for now, lets hear more pros and cons of this idea/topic.

  1. I like the idea. It would mean that test suites that are intermittent would cause pain for developers again instead of just the sheriffs like today. Perhaps if a few test suites are triggering the 1 out of 5 rule to fail developers will have more incentives to fix these tests for autoland to work better.

    To be fair I rather have autoland ASAP even if we can’t find an optimal solution to the intermittent problem.

    Thanks for your comment :BenWa!

    I agree, coming up with a better than OK but not perfect solution to our roadblocks will get us using it and increasing productivity.

