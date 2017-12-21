Over the years we have had great dreams of running our tests in many different ways. There was a dream of ‘hyperchunking’ where we would run everything in hundreds of chunks finishing in just a couple of minutes for all the tests. This idea is difficult for many reasons, so we shifted to ‘run-by-manifest’, while we sort of do this now for mochitest, we don’t for web-platform-tests, reftest, or xpcshell. Both of these models require work on how we schedule and report data which isn’t too hard to solve, but does require a lot of additional work and supporting 2 models in parallel for some time.
In recent times, there has been an ongoing conversation about ‘run-by-component’. Let me explain. We have all files in tree mapped to bugzilla components. In fact almost all manifests have a clean list of tests that map to the same component. Why not schedule, run, and report our tests on the same bugzilla component?
I got excited near the end of the Austin work week as I started working on this to see what would happen.
This is hand crafted to show top level productions, and when we expand those products you can see all the components:
I just used the first 3 letters of each component until there was a conflict, then I hand edited exceptions.
What is great here is we can easy schedule networking only tests:
and what you would see is:
^ keep in mind in this example I am using the same push, but just filtering- but I did test on a smaller scale for a bit with just Core-networking until I got it working.
What would we use this for:
- collecting code coverage on components instead of random chunks which will give us the ability to recommend tests to run with more accuracy than we have now
- tools like SETA will be more deterministic
- developers can filter in treeherder on their specific components and see how green they are, etc.
- easier backfilling of intermittents for sheriffs as tests are not moving around between chunks every time we add/remove a test
While I am excited about the 4 reasons above, this is far from being production ready. There are a few things we would need to solve:
- My current patch takes a list of manifests associated with bugzilla components are runs all manifests related to that component- we would need to sanitize all manifests to only have tests related to one component (or solve this differently)
- My current patch iterates through all possible test types- this is grossly inefficient, but the best I could do with mozharness- I suspect a slight bit of work and I could have reftest/xpcshell working, likewise web-platform tests. Ideally we would run all tests from a source checkout and use |./mach test <component>| and it would find what needs to run
- What do we do when we need to chunk certain components? Right now I hack on taskcluster to duplicate a ‘component’ test for each component in a .json file; we also cannot specify specific platform specific features and lose a lot of the functionality that we gain with taskcluster; I assume some simple thought and a feature or two would allow for us to retain all the features of taskcluster with the simplicity of component based scheduling
- We would need a concrete method for defining the list of components (#2 solves this for the harnesses). Currently I add raw .json into the taskcluster decision task since it wouldn’t find the file I had checked into the tree when I pushed to try. In addition, finding the right code names and mappings would ideally be automatic, but might need to be a manual process.
- when we run tests in parallel, they will have to be different ‘platforms’ such as linux64-qr, linux64-noe10s. This is much easier in the land of taskcluster, but a shift from how we currently do things.
This is something I wanted to bring visibility to- many see this as the next stage of how we test at Mozilla, I am glad for tools like taskcluster, mozharness, and common mozbase libraries (especially manifestparser) which have made this a simple hack. There is still a lot to learn here, we do see a lot of value going here, but are looking for value and not for dangers- what problems do you see with this approach?