Monthly Archives: July 2010

types of data we care about in a manifest

This is a bit controversial (similar to “what OS do you run”), but I want to start outlining what I find as useful metadata to categorize tests with.

DATA:

Currently with Reftest, we have a sandbox that provides data the manifest files can use as conditional options.  The majority of the sandbox items used are:

  • platform: MOZ_WIDGET_TOOLKIT (cocoa, windows, gtk2, qt)
  • platform: xulrunner.OS, XPCOMABI (if “”, we are on ARM)
  • environ:  haveTestPlugin, isDebugBuild, windowsDefaultTheme
  • config:    getBoolPref, getIntPref, nativeThemePref (could be a getBoolPref)

This is the type of information that a large portion of our tests care about.  Most of these options are somehow coded into mochitests as well (through getPref calls, checking the OS, or in Makefile conditions).  I would like to figure out how to add this type of data:

  • orange:   list of conditions this is random-orange on (but should pass)
  • fails:        list of conditions this is expected to fail on
  • e10s:       what service is used to cause this to fail under e10s
  • fennec:   does this fail when run on fennec, which platforms, what versions
  • remote:   does this fail when running tests remotely (required for android, originally for winmo)
  • results:    number of failures (if orange or fails above) and number of todo
  • runtime:  expected time to run this test (important on mobile)
  • product:  product name and version
  • future:     anything we find valuable in the future!

I can think of many ways to add this into the Reftest format or create a new format.  Looking at this data a but further, it really is not adding a lot of new information.  If we take the assumption that all tests are expected to pass in all configurations, any value assigned to a new piece of data would indicate that it fails under that given condition (or list of conditions).  As our supported platforms, configurations, and products grow, we will have a much greater need for this additional metadata.

INTEGRATION:

I would like to make all data pieces as tags vs raw conditions (Reftest does them like C conditions.)  This would allow much greater flexibility and adding data that doesn’t necessarily turn on/off a test.  For example, lets say a test is a random-orange for Firefox 1.9.1 (not 1.9.2), fails on Fennec Maemo 1.1 only, is orange on remote testing on Fennec android and currently is broken by e10s.  We could easily add those conditions to a list:

fails-if(OS==Maemo) fails-if(e10s==nsIPrefService) random-if(product=Firefox && xr.version==1.9.1) random-if(os=Android && remote==true) test_bitrot.html

So this is doable (please disregard any misused fails-if, random-if statements) and wouldn’t be too hard to add into a reftest.list style format for Mochitest (and even Reftest.)  Initially I thought it would be really cool to just run fails-if, random-if or skip-if statements with a small tweak to the command line.  This would give us the flexibility to turn on and off tests easier, but I realized that it would turn on/off all tests related to the condition.  I think a small adjustment in the format might allow for tags and we could tweak a run in the future with little work.  One example might be like:

fails(os=Maemo;e10s=nsIPrefService,nsIDOMGeoGeolocation) random(product=Firefox&xr.version=1.9.1;os=Android&remote=true) test_bitrot.html

This example is a minor change (which might not be needed), but helps simplify the syntax and keep in mind the idea of tags.  The backend details would need to be changed to support a ‘toggle’ of a tag in either scenario.  Maybe we just want to run e10s tests.  We can find all tests that have a e10s=nsiPrefService tag inside a fails tag block and just run those specific tests while maintaining all the other rules (skip on a certain OS or product).

There are still questions if the Reftest format is the right format for Mochitest.  It has a lot of weight since it works so well for so many of our tests.

3 Comments

Filed under testdev

improving personal hygiene by adjusting mochitests

I had signed up to do a talk at the Summit regarding mochitest manifests and while tossing around ideas (latest presentation) with my team I realized one huge advantage is the time all developers can save by rearranging our list of tests we run for mochitest.  I know that isn’t too clear, but my talk didn’t make the cut so this blog post will introduce you to the concept.

The Problem we are trying to solve:

  • We have hundreds of random orange tests, and the number keeps growing
  • new products (such as Fennec) yield many tests invalid or requiring serious work
  • some tests are better suited for other harnesses (we have already moved some reftests to mochitest)
  • new features (such as electrolysis) break hundreds of existing tests and require a lot of manual work before running any tests
  • other new features require preferences to be set (like d2d, oop)
  • developers often forget showers and food, which is ok if you are single and work from home

In previous discussions around a ‘manifest’ we have all been hung up on what format it should be in.  Quite frankly I don’t care what format it is, except I would like to ensure that it is easy to read and write by a human and that it has support for dynamic attributes which can represent various conditions as input or output of the tests themselves. Currently for mochitest, we have Makefile based manifests:

example of a makefile used as a manifest for mochitest

mochitest makefile manifest

This is not the most useful manifest as there is limited support for input conditions, but it works.  We have other great formats available and there has been great discussion around this topic in other blog posts.  In general the consensus is keeping something like the reftest manifest format and extending it as needed.  I don’t think it is productive to argue about formats as implementing a parser is a trivial task and hashing out the details of syntax can be resolved much later.

What I want to focus on is the problems outlined above and how we can solve them.  I had done some preliminary work on filtering mochitests for remote testing.  This is sort of a partial implementation of a manifest.  My work there was using a JSON format which I would not recommend due to the frustration I experienced editing it by hand.  The concepts I implemented gave light to the value of using tags and other metadata in our tests.  This works great for solving the Fennec and Electrolysis problems, but doesn’t do much for grouping tests by dependencies (like d2d and oop which need preferences set.)  It hints at a lame method to address the orange problem.

Lets look at test dependencies briefly.  In the above Makefile example, there are two test files that are ignored if MOZ_PHOENIX is defined.  That is an example of grouping based off what I would call an environmental condition (build or platform related.)  To effectively group, we might have to have overlapping requirements:

example of overlapping grouping conditions in a manifest

example of overlapping grouping conditions in a manifest

There a many ways to solve this.  One method is cascading manifests that overlay each other.  Not ideal for maintenance, but that concept is useful for supporting all tests which we need to temporarily disable when running on an Electrolysis build.  Another method is more complex tagging.  This could get cumbersome with 5 or 6 grouping definitions, but it would give the most flexibility and granularity.  Whatever the final solution is, I would prefer that it has little to no logic in Makefiles and inside the specific tests themselves.

The last problem to solve is the large list of [orange] tests.  One creative way to solve this is to create an overlay manifest of all orange tests.  Then when we run the tests (locally, tryserver, tinderbox), we ignore by default all known orange tests.  This saves a lot of test runtime as well as time investigating the 5-10 failures you see on the tinderbox push log:

example of a tinderbox changeset build + test

a list of builds and tests run for a given changeset with some known failures

To be complete, we could create a new test type orange ‘or‘ so we would have ‘Mo(1 2 3 4 5 oth or)‘.  This would help prioritize any test failures that are seen during a test run.  We could give top priority to 1-5,oth failures and lower priority to ‘or’ failures.  This is the shift in test lists that I was talking about at the beginning.  This concept of an orange suite starts to touch on the output part of the manifest where we can define metadata based on the output or results of the test. By doing this it would reduce the time running and investigating tests locally and on try server saving a lot of time for everybody.

This post should outline the problems we can solve with manifests and how we can utilize manifests to solve these problems.  If these topics and ideas sound interesting, unreasonable, half baked or otherwise thought provoking, please find me this week at the Summit and let me know what you are thinking about.

As for the biggest problem of personal hygiene, it should be obvious that the time saved with the orange test suite must be used for showers, food, haircuts and laundry!

2 Comments

Filed under Uncategorized