Tag Archives: mozilla

Android automation is becoming more stable ~7% failure rate

At Mozilla we have made our unit testing on android devices to be as important as desktop testing. Earlier today I was asked how do we measure this and what is our definition of success. The obvious answer is no failures except for code that breaks a test, but reality is something where we allow for random failures and infrastructure failures. Our current goal is 5%

So what are these acceptable failures and what does 5% really mean. Failures can happen when we have tests which fail randomly, usually poorly written tests or tests which have been written a long time ago and hacked to work in todays environment. This doesn’t mean any test that fails is a problem, it could be a previous test that changes a Firefox preference on accident. For Android testing, this currently means the browser failed to launch and load the test webpage properly or it crashed in the middle of the test. Other failures are the device losing connectivity, our host machine having hiccups, the network going down, sdcard failures, and many other problems. With our current state of testing this mostly falls into the category of losing connectivity to the device. For infrastructure problems they are indicated as Red or Purple and for test related problems they are Orange.

I took at a look at the last 10 runs on mozilla-central (where we build Firefox nightlies from) and built this little graph:

Firefox Android Failures

Firefox Android Failures

Here you can see that our tests are causing 6.67% of the failures and 12.33% of the time we can expect a failure on Android.

We have another branch called mozilla-inbound (we merge this into mozilla-central regularly) where most of the latest changes get checked in.  I did the same thing here:

mozilla-inbound Android Failures

mozilla-inbound Android Failures

Here you can see that our tests are causing 7.77% of the failures and 9.89% of the time we can expect a failure on Android.

This is only a small sample of the tests, but it should give you a good idea of where we are.

3 Comments

Filed under testdev

Professional Development, Improv and your audience

I had the opportunity to attend some really exciting professional development sessions at the All Hands.  Personally I found these very interesting, but I heard a lot of grumbling about how these are not adding a lot of value or of interest.

One reason I found these interesting is that in a previous life I had attended a few years of Improv acting classes and did a short stint of real onstage Improv acting.  In looping back to these professional development sessions, they reminded me of the core concepts we learned in Improv 101.  So if you felt that you missed out, sign up for an Improv class.  Maybe if there are professional development sessions at a future event they could just have an Improv acting class.

Related to the professional development courses, I found that most of these were sparsely attended.  Of those that did attend the courses received great reviews/ratings.  To be fair, the technical tracks that I attended had about the same attendance records of the professional development tracks.  Maybe we are not creating sessions that are of interest to our audience?  I know for the technical tracks we just propose something and it magically becomes a session.  I don’t recall getting any input in what sessions would be available to me.  Maybe in the future we can do a better job of getting input from the community (a.k.a audience)!

1 Comment

Filed under general, reviews

converting xpcshell from listing directories to a manifest

Last year we ventured down the path of adding test manifests for xpcshell in bug 616999.  Finding a manifest format is not easy because there are plenty of objections to the format, syntax and relevance to the project at hand.  At the end of the day, we depend too much on our build system to filter tests and after that we have hardcoded data in tests or harnesses to run or ignore based on certain criteria.  So for xpcshell unittests, we have added a manifest so we can start to keep track of all these tests and not depend on iterating directories and sorting or reverse sorting head and tail files.

The first step is to get a manifest format for all existing tests.  This was landed today in bug 616999 and is currently on mozilla-central.  This requires that all test files in directories be in the manifest file and that the manifest file includes all files in the directory (verified at make time).  Basically if you do a build, it will error out if you forget to add a manifest or test file to the manifest.  Pretty straightforward.

The manifest we have chosen is the ini format from mozmill.  We found that there is no silver bullet for a perfect test manifest, which is why we chose an existing format that met the needs of xpcshell.  This is easy to hand edit (as opposed to json), is easy to parse from python and javascript.  As compared to reftests which have a custom manifest format, we needed to just have a list of test files and more specifically a way to associate a head and tail script file (not easy with reftest manifests).  The format might not work for everything, but it gives us a second format to work with depending on the problem we are solving.

Leave a Comment

Filed under testdev

Some notes about adding new tests to talos

Over the last year and a half I have been editing the talos harness for various bug fixes, but just recently I have needed to dive in and add new tests and pagesets to talos for Firefox and Fennec.  Here are some of the things I didn’t realize or have inconveniently forget about what goes on behind the scenes.

  • tp4 is really 4 tests: tp4, tp4_nochrome, tp4_shutdown, tp4_shutdown_nochrome.  This is because in the .config file, we have “shutdown: true” which adds _shutdown to the test name and running with –noChrome adds the _nochrome to the test name.  Same with any test that us run with the shutdown=true and nochrome options.
  • when adding new tests, we need to add the test information to the graph server (staging and production).  This is done in the hg.mozilla.org/graphs repository by adding to data.sql.
  • when adding new pagesets (as I did for tp4 mobile), we need to provide a .zip of the pages and the pageloader manifest to release engineering as well as modifying the .config file in talos to point to the new manifest file.  see bug 648307
  • Also when adding new pages, we need to add sql for each page we load.  This is also in the graphs repository bug in pages_table.sql.
  • When editing the graph server, you need to file a bug with IT to update the live servers and attach a sql file (not a diff).   Some examples: bug 649774 and bug 650879
  • after you have the graph servers updated, staging run green, review done, then you can check in the patch for talos
  • For new tests, you also have to create a buildbot config patch to add the testname to the list of tests that are run for talos
  • the last step is to file a release engineering bug to update talos on the production servers.  This is done by creating a .zip of talos, posting it on a ftp site somewhere and providing a link to it in the bug.
  • one last thing is to make sure the bug to update talos has an owner and is looked at, otherwise it can sit for weeks with no action!!!

This is my experience from getting ts_paint, tpaint, and tp4m (mobile only) tests added to Talos over the last couple months.

Leave a Comment

Filed under testdev

Orange Factor and the WOO-Tang Clan

I have silently put up a tool call Orange Factor early last month as part of the War On Orange (WOO) project.  Over the last few weeks I have been iterating on this and working with jgriffin, jhammel, mcote and ctalbert (some have referred to us as the WOO-tang clan) to make this more useful and accurate.  Let me outline a few features of the site to give you a general introduction.

To start off with I know it takes a long time to load, but it should load in <30 seconds.   All the data is collected from bugs that are blocking randomorange.  This is done by parsing the comments and linked tinderbox logs to determine the frequency and type of failure.  We display a graph that tracks the cumulative orange factor (failure/push) over time.  NOTE: we are going off the number of pushes, not the number of tests ran.

 

Orange Factor graph

Graph of the Orange Factor over time

Next there is  the Heatmap.  This is similar to what you see on tinderboxpushlog, except this is color code by the number of failures.

Overall HeatMap

Overall HeatMap

From the HeatMap, you can click on a specific value to see more details about that test run (in the time range).  For example, here is OSX MoOth:

OSX MOth Testrun

OSX MOth Testrun

Ok, this is really cool.  You can click on each day and filter down to the specific day, also at the top, you see the drop down select boxes.  This is super awesome because you can slice and dice up the data to view it just how you want.

Next I want to show you what the view looks like for a specific day.  On the left hand side of the webpage is a Calendar, you can click any day (I clicked Sept 11th) or click the day on a test run or orange factor graph (hover your mouse over the graph and a link will show up).

Daily Test Results for all tests by Platform

Daily Test Results for all tests by Platform

You should get the point that there are many ways to view the data.  Actually probably too much information!  So lately we have been working on some bug centric views.  To start off with, we have a topfails style report but this is based on bugs, not failures in log files.  To get here, click on the “Research and Top Bugs” link on the right hand side of the page.  Here is a  “weekly” view that is the top 5 bugs per week:

Top 5 Bugs every 7 days

Top 5 Bugs every 7 days

Hover over the color bars to see the bug number and research it in more details.  Here is what you see when viewing a specific bug (544601):

Individual Bug Graph over Time

Individual Bug Graph over Time

Orange Factor has much more to offer, just poke around and see how you can make it useful.  Feedback is welcome, and feel free to ask any questions in #ateam!

2 Comments

Filed under testdev

tests that require privileged access

I have been working on a project to get mochitests running on a build of Fennec + electrolysis.  In general, you can follow along in bug 567417.

One of the large TODO items in getting the tests to run is actually fixing the tests which use UniversalXPConnect.  So my approach was to grep through a mochitest tests/ directory for @mozilla and parse it out.  With a few corner cases, this resulted in a full list of services we utilize from our tests (here is a sorted list by frequency 76 total services.)  Cool, but that didn’t seem useful enough.  Then I took my work that I have done for filtering (the json file) and cross referenced that with my original list of tests that use UniversalXPConnect.

Now I have a list of 59 services which all should pass in Fennec (a mozilla-central build from 2 weeks ago on n900) along with the first filename of the test which utilizes that services!

What else would be useful?

Leave a Comment

Filed under testdev

accessing privileged content from a normal webpage, request for example

One project which I am working on is getting mochitests to run in fennec + electrolysis.  Why this is a project is we don’t allow web pages to access privileged information in the browser anymore.  Mochitests in the simplest form use file logging and event generation.

The communication channel that is available between the ‘chrome’ and ‘content’ process is the messageManager protocol.  There are even great examples of using this in the unit tests for ipc.  Unfortunately I have not been able to load a normal web page and allow for my extension which used messageManager calls to interact.

I think what would be nice to see is a real end to end example of an extension that would demonstrate functionality on any given webpage.  This would be helpful to all addon authors, not just me:)  If I figure this out, I will update with a new blog post.

1 Comment

Filed under testdev

filtering mochitests for remote testing

One major problem we encounter in running all our unittests on Fennec is the large volume of failures and how to manage them.  Currently we have turned off mochitests on the Maemo tinderbox because nobody looked at the results (we still run reftest/crashtest/xpcshell!)

In many of my previous posts, I outlined methods for running tests remotely and that has proven to be very useful.  In order to test this code and continue developing it (without windows mobile or a working android build yet), I have developed a simple python test-agent that can run on a linux box (including n900.)  If you are curious, check it out and watch tests run remotely…it is pretty cool.

So the real problem I need to solve is how to not run a list of tests on a mobile device.  Solving this could get us to green faster and reduce the mochitest runtime in half!

In 2008 my solution was Maemkit.  Maemkit is just a small wrapper around the python test runner scripts that does some file (renaming) and directory (splitting into smaller chunks) manipulation to allow for more reliable test runs.  This has worked great and still works.  Enter remote testing and we need to hack up Maemkit a lot to accommodate for everything.  In addition a lot of the work maemkit does is already in the test runners.

Today I have moved the filtering to be a bit more configurable and less obscure.  This is really just a prototype and toolset to solve a problem for me locally, but the idea is something worth sharing.  What I have done is built up a json file (most of this was done automatically with this parsing script) which outlines the test and has some ‘tags’ that I can filter on:

   {
     "fennec-tags" : {"orange": "", "remote": "", "timeout": ""},
     "name" : "/tests/toolkit/content/tests/widgets/test_tooltip.xul"
   },
   {
     "fennec-results" : {"fail": 0, "todo": 0, "pass": 51},
     "name" : "/tests/MochiKit_Unit_Tests/test_MochiKit-Async.html",
     "note" : []
   }

You can see I now can run or skip tests that are tagged ‘orange’ or ‘timeout’. Better yet, I can skip tests with fennec-results that match fail > 0 if I want everything to be green.

So I took this a bit further since I wanted to turn these on or off depending on if I wanted to run tests or to investigate bugs, and I allowed for a filter language to be parsed in mochitest/tests/SimpleTest/setup.js.  I then modified my runtestsremote.py (subclass of runtests.py) so when launching mochitest from the command line I could control this like:

#run all tests that match the 'orange' tag
python runtestsremote.py --filter='run:fennec-tags(orange)'

#skip all tests that have the timeout tag
python runtestsremote.py --filter='skip:fennec-tags(timeout)'

#skip all tests that have failures > 0
python runtestsremote.py --filter='skip:fennec-results(fail>0)'

You should now see the power of this filtering and that with some more detailed thinking we could have a powerful engine to run what we want.  Of course this can run on regular mochitest (if you take the code in this patch from runtestsremote.py and add it to runtests.py.in) and run all the orange tests in a loop or something like that.

As a note, I previously mentioned a parsing script.   With some cleanup you could automatically create this json filter file based on tinderbox runs and fill in the tags to identify scenarios like orange (failing sometimes), timeouts, focus problems, etc…

Happy filtering!

Leave a Comment

Filed under general, testdev

patches checked in, tests can run on windows mobile!

My previous posts on the status of winmo automation outlined a series of patches to land. I am proud to say all of those have been reviewed (thanks to everybody), have landed (thanks ctalbert for checking these in) and with the help of this buildbot shim script are running very well!

So some highlights of what works:

  • Mochitest: runs great on the phone, and we use the –total-chunks and –this-chunk options so we don’t run out of memory. Right now I am testing it with –total-chunks=20, but suspect we can go a bit lower. As another note, the overhead to restart the phone, install the build, load the browser, and start the tests is 7.5 minutes on my HTC touch pro
  • Reftest/Crashtest/JSreftest: all of these run great. We need to run smaller manifest files though as after about 800 files on my device the tests come to a halt and execute maybe 1 test every few minutes. Luckily these run with manifest files so we can easily create a few manifests and have a working solution
  • Xpcshell: this is pretty straightforward. I don’t see any problems running this end to end as the harness by design only runs one case at a time. As a note, this is the only test harness that copies over the test files to the phone.
  • shim script: this turns on the webserver on your local IP so we can access it from the phone, as well as a bunch of other setup, monitoring and cleanup tasks. It would be nice to move the webserver functionality into the remote harness scripts in the near future so developers can easily run from a build tree
  • sutagent: this is actually the backbone of these tests. This tool runs on the phone and has come a long way over the last few months. This agent is a product of blassey and bmoss. The next steps here are to get the code checked into one of our source trees.

There are a few things we want to clean up, but overall we are at a great milestone on this project and ready to start rolling this out.

Leave a Comment

Filed under testdev

making the most of unittests on a phone

Background:
As we get closer to having unittests running on windows mobile, I am starting to wonder how long it will take to complete a test run. Just like maemo with the n810′s we will be running the tests in parallel, but can we see full build + test + results in < 8 hours for a nightly?

For desktop builds, we have an enormous amount of building and testing that we do. This means we would be running tests for about 512 checkins/month (just less than 1/hour) and double that if we wanted to include the try server.

Proposal:
What I propose doing is running a small set of unittests on each checkin, such that we get simple coverage on each build and it runs fast enough so we have results in the same time window as our other test data comes in for desktop tests. Right now when you check into m-c or 1.9.2 tests are only run on desktop builds, this would run tests on windows mobile as well.

The big question is what tests to run. Here are a couple ideas:

  1. Run a small smoketest for each test suite (reftest, mochitest, xpcshell) which covers different areas, but not as in depth. Save the full test for the nightlies
  2. Iterate through a series of chunks (say 30 chunks for mochitest; desktop builds split it 5 ways) and for each checkin just run a single chunk. After the course of 2 days we will have done a full cycle of little chunks.

Personally I like the second option best, but really I am trying to think out of the box for ways to reduce regression windows while getting feedback on a per checkin basis. What do you think?

Leave a Comment

Filed under general, testdev