improving personal hygiene by adjusting mochitests

I had signed up to do a talk at the Summit regarding mochitest manifests and while tossing around ideas (latest presentation) with my team I realized one huge advantage is the time all developers can save by rearranging our list of tests we run for mochitest.  I know that isn’t too clear, but my talk didn’t make the cut so this blog post will introduce you to the concept.

The Problem we are trying to solve:

  • We have hundreds of random orange tests, and the number keeps growing
  • new products (such as Fennec) yield many tests invalid or requiring serious work
  • some tests are better suited for other harnesses (we have already moved some reftests to mochitest)
  • new features (such as electrolysis) break hundreds of existing tests and require a lot of manual work before running any tests
  • other new features require preferences to be set (like d2d, oop)
  • developers often forget showers and food, which is ok if you are single and work from home

In previous discussions around a ‘manifest’ we have all been hung up on what format it should be in.  Quite frankly I don’t care what format it is, except I would like to ensure that it is easy to read and write by a human and that it has support for dynamic attributes which can represent various conditions as input or output of the tests themselves. Currently for mochitest, we have Makefile based manifests:

example of a makefile used as a manifest for mochitest

mochitest makefile manifest

This is not the most useful manifest as there is limited support for input conditions, but it works.  We have other great formats available and there has been great discussion around this topic in other blog posts.  In general the consensus is keeping something like the reftest manifest format and extending it as needed.  I don’t think it is productive to argue about formats as implementing a parser is a trivial task and hashing out the details of syntax can be resolved much later.

What I want to focus on is the problems outlined above and how we can solve them.  I had done some preliminary work on filtering mochitests for remote testing.  This is sort of a partial implementation of a manifest.  My work there was using a JSON format which I would not recommend due to the frustration I experienced editing it by hand.  The concepts I implemented gave light to the value of using tags and other metadata in our tests.  This works great for solving the Fennec and Electrolysis problems, but doesn’t do much for grouping tests by dependencies (like d2d and oop which need preferences set.)  It hints at a lame method to address the orange problem.

Lets look at test dependencies briefly.  In the above Makefile example, there are two test files that are ignored if MOZ_PHOENIX is defined.  That is an example of grouping based off what I would call an environmental condition (build or platform related.)  To effectively group, we might have to have overlapping requirements:

example of overlapping grouping conditions in a manifest

example of overlapping grouping conditions in a manifest

There a many ways to solve this.  One method is cascading manifests that overlay each other.  Not ideal for maintenance, but that concept is useful for supporting all tests which we need to temporarily disable when running on an Electrolysis build.  Another method is more complex tagging.  This could get cumbersome with 5 or 6 grouping definitions, but it would give the most flexibility and granularity.  Whatever the final solution is, I would prefer that it has little to no logic in Makefiles and inside the specific tests themselves.

The last problem to solve is the large list of [orange] tests.  One creative way to solve this is to create an overlay manifest of all orange tests.  Then when we run the tests (locally, tryserver, tinderbox), we ignore by default all known orange tests.  This saves a lot of test runtime as well as time investigating the 5-10 failures you see on the tinderbox push log:

example of a tinderbox changeset build + test

a list of builds and tests run for a given changeset with some known failures

To be complete, we could create a new test type orange ‘or‘ so we would have ‘Mo(1 2 3 4 5 oth or)‘.  This would help prioritize any test failures that are seen during a test run.  We could give top priority to 1-5,oth failures and lower priority to ‘or’ failures.  This is the shift in test lists that I was talking about at the beginning.  This concept of an orange suite starts to touch on the output part of the manifest where we can define metadata based on the output or results of the test. By doing this it would reduce the time running and investigating tests locally and on try server saving a lot of time for everybody.

This post should outline the problems we can solve with manifests and how we can utilize manifests to solve these problems.  If these topics and ideas sound interesting, unreasonable, half baked or otherwise thought provoking, please find me this week at the Summit and let me know what you are thinking about.

As for the biggest problem of personal hygiene, it should be obvious that the time saved with the orange test suite must be used for showers, food, haircuts and laundry!

2 Comments

Filed under Uncategorized

2 responses to “improving personal hygiene by adjusting mochitests

  1. Mook

    Unfortunately, each test could fail for various reasons – sometimes it’s the known orange, sometimes it’s something completely unrelated. If we separate out the tests that might go orange, they also can’t catch new things (because they would just be ignored).

    I believe the only real solution (sadly, not very practical in the short run) is to have no oranges – and until they are all fixed, make it a pain because it’s currently disproportionately difficult for new contributors who don’t know all of the active oranges off the top of their head. Obviously because I’m in that group 😉

  2. elvis314

    Thanks for the feedback! The Orange test suite would be run all the time, just shuffled into its own test run. The big advantage here is we can expect all other mochitests to be green 100% of the time, otherwise we need to fix the test or back the change out.

Leave a comment