Category Archives: testdev

I invite you to join me in welcoming 3 new tests to Talos

2 months ago, we added session restore tests to Talos, today I am pleased to announce 3 new tests:

  • media_tests – only runs on linux64 and is our first test to measure audio processing.  Much thanks to :jesup, and Suhas Nandaku from Cisco.
  •  tp5o_scroll - imagine if tscrollx and tp5 had a child- not only do we load the page, but we scroll the page.  Big thanks go to :avih for tackling this project.
  •  glterrain – The first webgl benchmark to show up in Talos.  Credit goes to :avih for driving this and delivering it.  There are others coming, this was the easiest to get going.

 

Stay tuned in the coming weeks as we have 2 others tests in the works:

  • ts_paint_cold
  • mainthreadio detection

 

Leave a comment

Filed under testdev

browser-chrome is greener and in many chunks

On Friday we rolled out a big change to split up our browser-chrome tests.  It started out as a great idea to split the devtools out into their own suite, then after testing, we ended up chunking the remaining browser chrome tests into 3 chunks.

No more 200 minute wait times, in fact we probably are running too many chunks.  A lot of heavy lifting took place, a lot of it in releng from Armen and Ben, and much work from Gavin and RyanVM who pushed hard and proposed great ideas to see this through.

What is next?

There are a few more test cases to fix and to get all these changes on Aurora.  We have more work we want to do (lower priority) on running the tests differently to help isolate issues where one test affects another test.

In the next few weeks I want to put together a list of projects and bugs that we can work on to make our tests more useful and reliable.  Stay tuned!

 

1 Comment

Filed under testdev

smoketest for firefox android on panda boards

Last September the panda board were deemed ready to run tests.  The next steps were to start integrating them into buildbot and making them 100% automated.  This task turned into a much larger project and the end results was developing a smoketest which yielded a cleaner integration point with the automation.

The core of the android automation to date has been on the NVidia Tegra 250 developer kit.  This has been running quite successfully with 3-4% total failure rate (product, test harness, tests, infrastructure, hardware).  Our goal for testing on Android 4.0 was to test on the panda boards which also have a NEON chipset.  Essentially this is just like adding more tegra boards to our automation, and for the most part that was true.

The main problems we faced came about when dealing with installing, rebooting, and overall management of the device.  For our tegras, this is all in a set of python code call sut_tools.  These sut_tools handle all the device management and with a few modifications we were able to do that for the panda boards.

While the tests and harnesses ran fine on a panda board at my desk, getting them to work smoothly with the sut_tools and the buildbot scripts proved to be quite a challenge.  After about 10 weeks of solid work and many bugs fixed in the android kernel, system libraries, Firefox and of course our harnesses we were able to get this going fairly reliably with <10% total failure rate when we first turned these tests on in late December.

In order to prove this was working, we developed a smoketest which would run on the production foopies (host to control the panda boards, 12 at a time) and production panda boards.  In fact this ended up being a way to diagnose boards, script changes and help debug overall test failures.  The original smoketest was going to be ‘run some tests on a given panda board for 24 hours’.  The resulting smoketest is a reuse of the exact tools we use in automation for cleanup, verification, installation, and uninstalling the product from the device under test.  We also run a set of production mochitests, so we mimic a real job being pushed with about 98% accuracy.

To run these, it is pretty easy:

While this sounds straightforward, there is a bit more required in order to test a new panda board or what we normally do a chassis of new panda boards.  As it stands now, I run an instance of smoketest.py in a different terminal window for every panda I am interested in testing.  Usually this is 6-8 at a time, but this can easily be done for 1 or 12 without concern.

I usually run this in a loop of 100:

  • $ for i in {0..99}; do python smoketest.py; done

Then I grep the logs looking for failure messages or more specifically count how many success messages I have.  If I have >95% success rate across all my logs, this is a good sign that things are ready to roll.

In the future, it would be nice to make smoketest.py have a better reporting and looping system.  There is also the need to get us to 99% success rate running a controlled smoketest.  One thing that would make this easier would be a tool to launch on a given set of machines and report back information and query the log files for easier parsing and status.

 

Leave a comment

Filed under testdev

Mozilla A-Team – How to compare talos numbers from try server to trunk

Have you ever been working on a change that you think will affect performance numbers and you were not sure how to verify the impact of your change?

I have had questions on how to do this and recently I needed to do it myself (as I introduced a change to Talos which caused a big old performance regression in everything).

The main use case I needed to do was run a change on Try server and verify that it did in fact fix my performance regression.  Normally I would go to tbpl, and click on each of my tests to see the reported number(s).  For each of those test:number sets, I would look on graph server (hint: you can get to graph server for a given test by clicking on the reported number) and verify that my numbers were inside the expected range for that test/platform/branch based on the history.  If only I was part of a software developers union I could complain that that boring time intensive work was not in my contract.

To simplify my life, I decided to automate this with a python script.  I wrote compare.py which will spit out a text based summary of what I described above.  Here is a sample output:

python compare.py --revision c094aeea5f73 --branch Try --masterbranch Firefox --test tp5n --platform Linux
Linux:
    tp5n: 292.157 -> 400.444; 308.596

A quick explanation:

  • 292.157 is the lowest number reported in the last 7 days for tp5n,linux
  • 400.444 is the highest number reported in the last 7 days for tp5n,linux
  • 308.596 is the value reported from my test on try server for tp5n,linux

While this doesn’t do the previous 30 changesets and the next 5, it gives a pretty good indicator about what to expect.  I can run this on a different time range (to check the 7 days prior to my introduced regression) by adding –skipdays to the command line:

python compare.py --revision c094aeea5f73 --branch Try --masterbranch Firefox --test tp5n --platform Linux --skipdays 6
 Linux:
 **tp5n: 311.975 -> 398.571; 308.596

Here you will see a “**tp5n”, and that indicates that the Try server number is not in the range and should be looked at the old fashioned way.

Hope this helps in debugging.

1 Comment

Filed under testdev

Mozilla A-Team – Unraveling the mystery of Talos – part 1 of a googol

Most people at Mozilla have heard of Talos, if you haven’t, Talos is the performance testing framework that runs for every checkin that occurs at Mozilla.

Over the course of the last year I have had the opportunity to extend, modify, retrofit, rewrite many parts of the harness and tests that make up Talos.

It seems that once or twice a month I get a question about Talos.  Wouldn’t it be nice if I documented Talos?  When Alice was the main owner of Talos, she had written up some great documentation and as of today I am announcing that it has gone through an update:

https://wiki.mozilla.org/Buildbot/Talos.

Stay tuned as there will be more updates to come as we make the documentation more useful.

3 Comments

Filed under general, testdev

work in progress – turning on reftests in the new native UI for firefox on android

The latest builds of Mobile Firefox are switching to using a Java based UI, which means the tests that depend on a traditional window environment and backend XUL will most likely fail.  In general we have mochitests and some talos tests running, but reftests are a huge piece of testing that hasn’t been working.

In bug 704509, I have a patch to get reftests working with a Java front end.  This is really just using the XUL backend, but making it work with the limited support we have for addons and XUL, here are some differences:

  • I am using a bootstrapped extension
  • the reftest code needs to specify the window and document we are using
  • I am not using a commandline handler, all options are set as preferences
  • There is a nasty hack to attach our reftest <browser> to the default <window>

I need to make this work with our current reftest harness for Firefox.  So most of these changes will need to be cleaned up to work in a way acceptable to everybody and minimize the special case hacking for android.

Leave a comment

Filed under testdev

Android automation is becoming more stable ~7% failure rate

At Mozilla we have made our unit testing on android devices to be as important as desktop testing. Earlier today I was asked how do we measure this and what is our definition of success. The obvious answer is no failures except for code that breaks a test, but reality is something where we allow for random failures and infrastructure failures. Our current goal is 5%

So what are these acceptable failures and what does 5% really mean. Failures can happen when we have tests which fail randomly, usually poorly written tests or tests which have been written a long time ago and hacked to work in todays environment. This doesn’t mean any test that fails is a problem, it could be a previous test that changes a Firefox preference on accident. For Android testing, this currently means the browser failed to launch and load the test webpage properly or it crashed in the middle of the test. Other failures are the device losing connectivity, our host machine having hiccups, the network going down, sdcard failures, and many other problems. With our current state of testing this mostly falls into the category of losing connectivity to the device. For infrastructure problems they are indicated as Red or Purple and for test related problems they are Orange.

I took at a look at the last 10 runs on mozilla-central (where we build Firefox nightlies from) and built this little graph:

Firefox Android Failures

Firefox Android Failures

Here you can see that our tests are causing 6.67% of the failures and 12.33% of the time we can expect a failure on Android.

We have another branch called mozilla-inbound (we merge this into mozilla-central regularly) where most of the latest changes get checked in.  I did the same thing here:

mozilla-inbound Android Failures

mozilla-inbound Android Failures

Here you can see that our tests are causing 7.77% of the failures and 9.89% of the time we can expect a failure on Android.

This is only a small sample of the tests, but it should give you a good idea of where we are.

3 Comments

Filed under testdev

notes on a python webserver

Last week I created a python webserver as a patch for make talos-remote.  This ended up being frought with performance issues, so I have started looking into it.  I based it off of the profileserver.py that we have in mozilla-central, and while it worked I was finding my tp4 tests were timing out.

I come to find out we are using a synchronous webserver, so this is easy to fix with a ThreadingMixIn, just like the chromium perf.py script:

class MyThreadedWebServer(ThreadingMixIn, BaseHTTPServer.HTTPServer):
    pass

Now the test was finishing, but very very slowly (20+ minutes vs <3 minutes).  After doing a CTRL+C on the webserver, I saw a lot of requests hanging on log_message and gethostbyaddr() calls.  So I ended up overloading the log_message call and things worked.

class MozRequestHandler(SimpleHTTPServer.SimpleHTTPRequestHandler):
    # I found on my local network that calls to this were timing out
    def address_string(self):
        return "a.b.c.d"

    # This produces a LOT of noise
    def log_message(self, format, *args):
        pass

Now tp4m runs as fast as using apache on my host machine.

4 Comments

Filed under testdev

converting xpcshell from listing directories to a manifest

Last year we ventured down the path of adding test manifests for xpcshell in bug 616999.  Finding a manifest format is not easy because there are plenty of objections to the format, syntax and relevance to the project at hand.  At the end of the day, we depend too much on our build system to filter tests and after that we have hardcoded data in tests or harnesses to run or ignore based on certain criteria.  So for xpcshell unittests, we have added a manifest so we can start to keep track of all these tests and not depend on iterating directories and sorting or reverse sorting head and tail files.

The first step is to get a manifest format for all existing tests.  This was landed today in bug 616999 and is currently on mozilla-central.  This requires that all test files in directories be in the manifest file and that the manifest file includes all files in the directory (verified at make time).  Basically if you do a build, it will error out if you forget to add a manifest or test file to the manifest.  Pretty straightforward.

The manifest we have chosen is the ini format from mozmill.  We found that there is no silver bullet for a perfect test manifest, which is why we chose an existing format that met the needs of xpcshell.  This is easy to hand edit (as opposed to json), is easy to parse from python and javascript.  As compared to reftests which have a custom manifest format, we needed to just have a list of test files and more specifically a way to associate a head and tail script file (not easy with reftest manifests).  The format might not work for everything, but it gives us a second format to work with depending on the problem we are solving.

Leave a comment

Filed under testdev

Some notes about adding new tests to talos

Over the last year and a half I have been editing the talos harness for various bug fixes, but just recently I have needed to dive in and add new tests and pagesets to talos for Firefox and Fennec.  Here are some of the things I didn’t realize or have inconveniently forget about what goes on behind the scenes.

  • tp4 is really 4 tests: tp4, tp4_nochrome, tp4_shutdown, tp4_shutdown_nochrome.  This is because in the .config file, we have “shutdown: true” which adds _shutdown to the test name and running with –noChrome adds the _nochrome to the test name.  Same with any test that us run with the shutdown=true and nochrome options.
  • when adding new tests, we need to add the test information to the graph server (staging and production).  This is done in the hg.mozilla.org/graphs repository by adding to data.sql.
  • when adding new pagesets (as I did for tp4 mobile), we need to provide a .zip of the pages and the pageloader manifest to release engineering as well as modifying the .config file in talos to point to the new manifest file.  see bug 648307
  • Also when adding new pages, we need to add sql for each page we load.  This is also in the graphs repository bug in pages_table.sql.
  • When editing the graph server, you need to file a bug with IT to update the live servers and attach a sql file (not a diff).   Some examples: bug 649774 and bug 650879
  • after you have the graph servers updated, staging run green, review done, then you can check in the patch for talos
  • For new tests, you also have to create a buildbot config patch to add the testname to the list of tests that are run for talos
  • the last step is to file a release engineering bug to update talos on the production servers.  This is done by creating a .zip of talos, posting it on a ftp site somewhere and providing a link to it in the bug.
  • one last thing is to make sure the bug to update talos has an owner and is looked at, otherwise it can sit for weeks with no action!!!

This is my experience from getting ts_paint, tpaint, and tp4m (mobile only) tests added to Talos over the last couple months.

Leave a comment

Filed under testdev