Currently Fennec has thousands of failures when running the full set of unittests. As it stands when tinderbox runs these, we just set the “pass” criteria as total failures <= acceptable # of failures. As you can imagine, this has a lot of room for improvement.
Enter the LogCompare tool which happyhans and I have been working on with help from mikeal for the couchdb backend. What we do is take the tinderbox log file, parse it and upload it to a database! This way we get a list of all the tests that were run and if they passed or failed. Now we can compare test by test what is fixed, a known failure or a new failure. What is even better is that we are running Mochitests in parallel on 4 different machines and LogCompare can tell us if the tests on machine1 pass or fail without necessarily waiting for the other tests to complete. Another bonus is we can track a specific test over time to look for random orange data.
The concept is simple, here are some of the details and caveats:
- We track tests by test filename, not by directory or test suite
- A single filename can have many tests (mochitest), so there is no clean way to track each specific test.
- If a test fails, future tests (sometimes in the same file, folder, or suite) are skipped.
- Parsing the log file is a nasty task with many corner cases
- To match test names up correctly, we need to strip out full paths and just view the relative path/filename.
- Need to handle when new tests are added or existing ones removed
- Need to baseline from Firefox for full list of tests and counts
The goal here is to keep it simple while bringing the total failure count of the unittests on Fennec to Zero!