Category Archives: general

Mozilla A-Team – Unraveling the mystery of Talos – part 1 of a googol

Most people at Mozilla have heard of Talos, if you haven’t, Talos is the performance testing framework that runs for every checkin that occurs at Mozilla.

Over the course of the last year I have had the opportunity to extend, modify, retrofit, rewrite many parts of the harness and tests that make up Talos.

It seems that once or twice a month I get a question about Talos.  Wouldn’t it be nice if I documented Talos?  When Alice was the main owner of Talos, she had written up some great documentation and as of today I am announcing that it has gone through an update:

https://wiki.mozilla.org/Buildbot/Talos.

Stay tuned as there will be more updates to come as we make the documentation more useful.

3 Comments

Filed under general, testdev

Two failed attempts with technology today, just one of those days

Today I experienced two WTF moments while trying to use computers:

1) BrowserID ended up being a total failure for me

2) Accessing people.mozilla.org is next to impossible when trying to share files across computers

I have heard great things about BrowserID, and today was my first real chance at it.  I had an account on builder.addons.mozilla.org, and this was with my <me>@mozilla.com email address.  It has been a few months since I had been on there and now it uses BrowserID for all access.  Great!!  I had signed up with BrowserID with my <me>@gmail.com address, but that failed to log me in.  So I clicked the ‘add another email address’, and got a verification email in my inbox.  Trying to verify was impossible with some cryptic error messages.  10 minutes later after trying to log in, I finally found my way to #identity and was told to try it again.  It magically worked.  OK, let me log in to my addons account, no luck.  After 15 more minutes of poking around, I found that my @mozilla.com email address worked with BrowserID just fine by testing it on another site, but it still failed on addons.

Here is my take of the problem:

  • BrowserID is supposed to make logging in easier, 30 minutes of debugging and I still cannot login.
  • There are no useful error and help messages on the BrowserID site, nor AMO.  How could my mom figure this out?
  • Where in the world is my ‘I forgot my username/password’ link?  Honestly I could have signed up on AMO with a totally random email address and could have been wasting a lot of time.
  • I found it easier to signup as a new user with a different BrowserID email, than to figure out how to login with my normal account.

My next problem occurs with accessing people.mozilla.org.  I have been using this for 3.5 years on a regular basis.  I put log files up there for people to read, zip files when I want to share some code or an build, and sometimes I create a webpage to outline data.  I depend on this as a workflow since I know of no other file server at mozilla that I can just scp files up to.  Just this past weekend, some work was done on the server and the permissions got messed up.  This was fixed, then it wasn’t, it was fixed and now it isn’t.  I can detect patterns and that is a pretty easy pattern to detect.  What really gets me is this message when I log in:

Last login: Thu May 17 18:41:20 2012 from zlb1.dmz.scl3.mozilla.com
All files stored on this server are subject to automated scans.
You shouldn’t store sensitive information on this server, and you should
avoid having production services depend on data stored here.
Files in ~/public_html may be seen by anyone on the Internet.
[jmaher@people1.dmz.scl3 ~]$

Who in their right mind would think that putting files in a folder called ‘public_html’ would not be seen by anyone on the Internet?  I expect tomorrow I will have to sign a NDA to access my people.mozilla.org account.

The big problem here is that I wasted 20 minutes doing a task that I normally do in 2 minutes and delayed getting a perma red test fixed because I couldn’t find a place to upload a fixed talos.zip to.

Enough complaining and ranting and back to work on reftests for android native!

5 Comments

Filed under general, personal, reviews

Professional Development, Improv and your audience

I had the opportunity to attend some really exciting professional development sessions at the All Hands.  Personally I found these very interesting, but I heard a lot of grumbling about how these are not adding a lot of value or of interest.

One reason I found these interesting is that in a previous life I had attended a few years of Improv acting classes and did a short stint of real onstage Improv acting.  In looping back to these professional development sessions, they reminded me of the core concepts we learned in Improv 101.  So if you felt that you missed out, sign up for an Improv class.  Maybe if there are professional development sessions at a future event they could just have an Improv acting class.

Related to the professional development courses, I found that most of these were sparsely attended.  Of those that did attend the courses received great reviews/ratings.  To be fair, the technical tracks that I attended had about the same attendance records of the professional development tracks.  Maybe we are not creating sessions that are of interest to our audience?  I know for the technical tracks we just propose something and it magically becomes a session.  I don’t recall getting any input in what sessions would be available to me.  Maybe in the future we can do a better job of getting input from the community (a.k.a audience)!

1 Comment

Filed under general, reviews

mochikit.jar changes are in mozilla central

Last night we landed the final patches to make mochikit.jar a reality.  This started out as a bug where we would package all the mochikit harness + chrome tests into a single .jar file and then on a remote system copy that to the application directory and run the tests locally.  It ended up being much more than that, let me explain some of the changes that have taken place.

why change all of this?

In order to test remotely (on mobile devices such as windows mobile and android) where there are not options to run tools and python scripts, we need to put everything in the browser that it needs and launch the browser remotely.  The solution for tests that are not accessible over the network is to run them from local files.

what is mochikit.jar?

Mochikit.jar is an extension that is installed in the profile and contains all the core files that mochitest (plain, chrome, browser-chrome, a11y) needs to run in a browser.  This doesn’t contain any of the external tools such as ssltunnel and python scripts to set up a webserver.  When you do a build, you will see a new directory in $(objdir)/_tests/testing/mochitest called mochijar.  Feel free to poke around there.  As a standalone application all chrome://mochikit/content calls will use this extension, not a pointer to the local file system.  The original intention of mochkit.jar was to include tests data, but we found that to create an extension using jar.mn we needed a concrete list of files and that was not reasonable to do for our test files.  So we created tests.jar.

what is tests.jar?

tests.jar is the actual test data for browser-chrome, chrome, and a11y tests.  These are all tests that are not straightforward to access remotely over http, so we are running these locally out of a .jar file.  tests.jar is only created when you do a ‘make package-tests’ and ends up in the root of the mochitest directory as tests.jar.  If the harness finds this file, it copies it to the profile and generates a .manifest file for the .jar file, otherwise we generate a plain .manifest file to point to the filesytem.  Finally we dynamically registers tests.manifest from the profile.  Now all your tests will have a chrome://mochitests/content instead of chrome://mochikit/content.

What else changed?

A lot of tests had to change to work with this because we had hardcoded chrome://mochikit/content references in our test code and data.  It is fine to have that in there for references to the harness and core utilities, but to reference a local piece of test data, it was hardcoded and didn’t need to be.  A few tests required some more difficult work where we had to extract files temporarily to a temp folder in the profile and reference them with a file path.

what do I need to do when writing new tests?

please don’t cut and paste code then change it to reference a data, utility, or other uri that has chrome://mochikit/content/ in it.  If you need to access a file with the full URI or as a file path, here are some tips:

* a mochitest-chrome test that needs to reference a file in the same directory or subdir:
let chromeDir = getRootDirectory(window.location.href);

* a browser-chrome test that needs to reference a file in the same directory or subdir:
//NOTE: gTestPath is set because window.location.href is not always set on browser-chrome tests
let chromeDir = getRootDirectory(gTestPath);

* extracting files to temp and accessing them

  let rootDir = getRootDirectory(gTestPath);
  let jar = getJar(rootDir);
  if (jar) {
    let tmpdir = extractJarToTmp(jar);
    rootDir = "file://" + tmpdir.path + '/';
  }
  loader.loadSubScript(rootDir + "privacypane_tests.js", this);

Leave a comment

Filed under general, testdev

filtering mochitests for remote testing

One major problem we encounter in running all our unittests on Fennec is the large volume of failures and how to manage them.  Currently we have turned off mochitests on the Maemo tinderbox because nobody looked at the results (we still run reftest/crashtest/xpcshell!)

In many of my previous posts, I outlined methods for running tests remotely and that has proven to be very useful.  In order to test this code and continue developing it (without windows mobile or a working android build yet), I have developed a simple python test-agent that can run on a linux box (including n900.)  If you are curious, check it out and watch tests run remotely…it is pretty cool.

So the real problem I need to solve is how to not run a list of tests on a mobile device.  Solving this could get us to green faster and reduce the mochitest runtime in half!

In 2008 my solution was Maemkit.  Maemkit is just a small wrapper around the python test runner scripts that does some file (renaming) and directory (splitting into smaller chunks) manipulation to allow for more reliable test runs.  This has worked great and still works.  Enter remote testing and we need to hack up Maemkit a lot to accommodate for everything.  In addition a lot of the work maemkit does is already in the test runners.

Today I have moved the filtering to be a bit more configurable and less obscure.  This is really just a prototype and toolset to solve a problem for me locally, but the idea is something worth sharing.  What I have done is built up a json file (most of this was done automatically with this parsing script) which outlines the test and has some ‘tags’ that I can filter on:

   {
     "fennec-tags" : {"orange": "", "remote": "", "timeout": ""},
     "name" : "/tests/toolkit/content/tests/widgets/test_tooltip.xul"
   },
   {
     "fennec-results" : {"fail": 0, "todo": 0, "pass": 51},
     "name" : "/tests/MochiKit_Unit_Tests/test_MochiKit-Async.html",
     "note" : []
   }

You can see I now can run or skip tests that are tagged ‘orange’ or ‘timeout’. Better yet, I can skip tests with fennec-results that match fail > 0 if I want everything to be green.

So I took this a bit further since I wanted to turn these on or off depending on if I wanted to run tests or to investigate bugs, and I allowed for a filter language to be parsed in mochitest/tests/SimpleTest/setup.js.  I then modified my runtestsremote.py (subclass of runtests.py) so when launching mochitest from the command line I could control this like:

#run all tests that match the 'orange' tag
python runtestsremote.py --filter='run:fennec-tags(orange)'

#skip all tests that have the timeout tag
python runtestsremote.py --filter='skip:fennec-tags(timeout)'

#skip all tests that have failures > 0
python runtestsremote.py --filter='skip:fennec-results(fail>0)'

You should now see the power of this filtering and that with some more detailed thinking we could have a powerful engine to run what we want.  Of course this can run on regular mochitest (if you take the code in this patch from runtestsremote.py and add it to runtests.py.in) and run all the orange tests in a loop or something like that.

As a note, I previously mentioned a parsing script.   With some cleanup you could automatically create this json filter file based on tinderbox runs and fill in the tags to identify scenarios like orange (failing sometimes), timeouts, focus problems, etc…

Happy filtering!

Leave a comment

Filed under general, testdev

Stair climbing as a sport

This post deviates from my slew of Mozilla automation related posts, but feel free to read along. After moving to a high rise condo building a year ago I started taking the stairs (to the 39th floor) instead of the elevator a few times per week. As time went on this became enjoyable and I could make it to the top without falling on the floor with shaky legs, gasping for air and on the verge of needing life support (just ask my wife about my earlier climbs.)

Time to step it up (pardon the pun). I became involved in some of the online groups, and noticed a lot of other people getting into the sport. It seems like the last couple years has seen an explosion of participants, elite competitors (ones who actually run up the stairs), and events in cities all over the world. Earlier this month I took the plunge and signed up for my first race. This is small in comparison to the Sears tower or the CN tower, but you have to start somewhere.

Wish me luck in 4 weeks and consider climbing up the stairs next time you are waiting for the elevator, it really is fun.

11 Comments

Filed under general, personal

Happy PI Day

Just a quick post to wish everybody a happy PI day. If you want to celebrate, I suggest buying t-shirt with PI on it, eating some form of pie or better yet calculating or reciting digits of PI.

4 Comments

Filed under general

making the most of unittests on a phone

Background:
As we get closer to having unittests running on windows mobile, I am starting to wonder how long it will take to complete a test run. Just like maemo with the n810′s we will be running the tests in parallel, but can we see full build + test + results in < 8 hours for a nightly?

For desktop builds, we have an enormous amount of building and testing that we do. This means we would be running tests for about 512 checkins/month (just less than 1/hour) and double that if we wanted to include the try server.

Proposal:
What I propose doing is running a small set of unittests on each checkin, such that we get simple coverage on each build and it runs fast enough so we have results in the same time window as our other test data comes in for desktop tests. Right now when you check into m-c or 1.9.2 tests are only run on desktop builds, this would run tests on windows mobile as well.

The big question is what tests to run. Here are a couple ideas:

  1. Run a small smoketest for each test suite (reftest, mochitest, xpcshell) which covers different areas, but not as in depth. Save the full test for the nightlies
  2. Iterate through a series of chunks (say 30 chunks for mochitest; desktop builds split it 5 ways) and for each checkin just run a single chunk. After the course of 2 days we will have done a full cycle of little chunks.

Personally I like the second option best, but really I am trying to think out of the box for ways to reduce regression windows while getting feedback on a per checkin basis. What do you think?

Leave a comment

Filed under general, testdev

Fennec proved more useful than Opera

Yesterday while driving to Mountain View from Oakland we were trying to find some dinner in the general Mountain View area. I pulled out my omnia II, launched Opera, and found a Thai restaurant in Palo Alto! Unfortunately with Opera I was unable to get a google map to load…FAIL.

I had a build of Fennec that I was using for mochitest development on the filesystem, so I launched it and tried to do the same search. Fully expecting a crash or two, I was surprised when I got search results in the same perceived amount of time as I did on Opera. The best thing was I could get a google map and zoom in/out to get the details I needed to figure out how to get to the restaurant.

End result: Fennec saved the day and proved itself as a useful browser. Time to clean it up and release Alpha4!

7 Comments

Filed under general, reviews

Making mobile automation development more efficient

In a recent discussion with ctalbert, we discussed what is the right course to take for getting automation running on windows mobile and how we could avoid this 6 month nightmare for the next platform. The big question we want to answer is how can we implement our automation for talos and unittests on Windows Mobile and not have to reinvent the wheel for the next platform?

Our current approach to date has been to take what we have for Firefox (which works really good) and adapt it to work on a mobile device. This approach has not been very successful in the development of Fennec (functionality, UI, performance) nor the automation of unittests and talos.

After further thought on this subject, I find there are 4 area to focus on:

  1. Infrastructure and Tools
  2. Porting Harnesses and Tests
  3. Managing Automation and Failures
  4. Mobile Specific Automation

Each of these 4 areas are tightly coupled yet require a much different solution than the others. Let me outline each area in a bit more detail describing the problem, our solution, and a longer term solution.

Infrastructure and Tools:

This is the area of all the assumptions. The OS you run on, network connectivity, available disk space for test and log files, tools to manage your environment and processes and a method for doing this all automatically. I am going to leave out the build and reporting infrastructure from this topic as those take place separately and don’t run on the device.

Our first port of this to maemo was not as difficult because we could use python to manage files, network, and processes just as we do on a desktop OS. Minor adjustments had to be made for running on storage cards, using different python libraries (we have a limited set available on maemo) and system tools, as well as changing requirements for process names and directory structures. Also maemo has ssh functionality, a cli terminal, and a full set of command line tools to manage stuff.

Moving onto Windows Mobile, we don’t have the tools and infrastructure like we do on Maemo. This means we need to spend a lot of time hacking our communications required for automation and scripting tools like python. Almost all process interaction (create, query, kill), need custom code to take care of it. This has presented us with a problem where we don’t have the luxury of our OS support and tool support. Our approach to date has been to write (or in the case of python, port) our tools to make them work on the device. Unfortunately after 4 months we don’t have a working set of automation that people are happy with.

How can we create infrastructure that is scalable to all platforms? From what I have seen, we need to move away from our reliance on all tools on the device. This means no python or web server on the device, no test data on the device, and assume we won’t be able to store log files or use common system tools. I would like to see a custom communication layer for each OS. So for Windows Mobile, we would create a server that lives on the device which: ensures we have ip connectivity, provides file system tools, process management tools, and allows for the OS to reboot and come back connected. The other half of this is a job server which sends commands to the device and serves/collects test data via a web interface. I know this is a big step from where we are now, but in the future it seems like the easiest approach.

Porting Harnesses and Tests:

This focus area is more about making sure the environment is setup correctly, the right tests are run and useful logs are created. This can be developed without a full infrastructure in place, but it really requires some knowledge about the infrastructure.

For Maemo, a lot of work was done to extract the unittests from the source tree and retrofit the tools and harnesses to manage tests in “chunks” rather than running them all at once. We have done a lot of work to clean up tests that assume preferences or ones that look for features.

The challenge on Windows Mobile is without an infrastructure the tests rely on we need to do things differently. Very few bugs were found that prevented tests from running while porting tests to Maemo. For WinMo, that is a different story. We cannot run the web server locally, cannot load our mochitests in an iframe, and have trouble creating log files. All of these issues force us to morph our testing even further away from where it was and realize that we need to do this differently.

What I see as the ultimate solution here is to setup a “test server” which serves all our test data. Each test would remove the dependencies it has on localhost and work with a test server given that it has an IP address. We would then have an extension for Firefox/Fennec which would serve as the test driver to setup, run, and report the test results. This would allow for any mobile device (that supports addons), desktop, or community member to install the addon and start testing.

Managing Automation and Failures:

This is a much smaller topic than the previous two, but once we do have data generated, how do we keep generating it reliably and use it to our advantage?

Right now our toolset of Tinderbox and Buildbot do a great job for Firefox. I believe that we can still utilize them for mobile. There might be specific issues with them, but for the purposes of running Talos and Unittests, we need something that will take a given build, send it to a machine, run tests, and figure out when it is done. We even have great tools to notify us via IRC when a test suite fails.

The danger here is when testing on a new device or product, we find hundreds if not thousands of failures. The time required to track those down and file bugs is a big job by itself. When you have a large number of bugs waiting to be fixed it won’t happen in the same week (or quarter). This brings up a problem where nobody pays attention to the reported data because there are always so many failures.

The other problem that occurs is with so many crashes and running tests in smaller chunks or 1 by 1, we end up with smaller log files and lose the ability to get the pass/fail status that our test harnesses for Firefox rely upon. I know simply looking for a TEST-UNEXPECTED string in *.log is a reasonable solution, but as we have learned there are a lot of corner cases and that doesn’t tell you which tests were not run.

How can we make this better? Our first step to solving this problem is LogCompare. This is a log parser that uploads data to a database and lets us compare build by build. This solves the problem of finding new failures and ignoring known failures if we want to. A final solution to this would be to expand on this idea and have test runners (via the addon) upload test result blobs to a database. Adding more tools to query status of a test run, missed tests, etc… can be done giving us more detailed reports than just pass/fail. In the long term a tool like this can be used to look at random orange results as well as getting status of many community members pitching in CPU time.

Mobile Specific Automation:

The last piece of the puzzle is to create specific test cases to exercise mobile features. This is fairly trivial and great work has already been done for bookmarks and tabs using browser-chrome. This is important as the more developers we have working on mobile and the more branches we have, the greater the frequency of regressions.

Here is the problem, the mobile UI is changing so rapidly that it would take a full time job to keep up with the automation. This is assuming you have comprehensive tests written. It is actually faster to install a build and poke around the UI for 10 minutes than it is to keep the tests maintained. I know in reality there will be moving pieces on a regular basis, but right now we are ripping big pieces out and rewriting everything else. As a reference point the tab functionality has changed 4 times in the last year.

Looking at the regressions found in the last couple weeks we would not have found those with automation. There is a great list of stuff we want to automate in the Fennec UI and almost none of those would have failed as the result of a recent regression. This means we need many more tests than a few hundred core functionality tests. It also points out that we are not going to catch everything even if we all agree that our tests are comprehensive.

What is the best way to utilize automation? Until a 1.0 release, we have to expect that things will be very volatile. We should fix the automation we currently have and require developers to run it on their desktop or device before checking in. This should be a 1.0 requirement. If a developer is changing functionality, fix the tests. Why this works is we don’t have a lot of tests right now. This will serve more of a purpose to fix the process than finding bugs. Post 1.0, lets build up the automation to have decent coverage of all the major areas (pan, zoom, tabs, controls, awesomebar, bookmarks, downloads, addons, navigation), and keep the requirement that these tests need to run for each patch we checkin. The time to run a test will be fast on a desktop <5 minutes.

Summary:

While we seem to be flopping around like a fish out of water, we just need some clear focus and agreement from all parties about the direction and we can have a great solution. My goal is to be forward looking and not dwell on the existing techniques that work, yet are still being improved. After looking at this from a future point of view I see that developing a great solution to meet our needs now can also allow for greater community involvement leading to greater test coverage for Fennec and Firefox. The amount of work required to generalize this is equivalent to the same work for a specialized solution for Windows Mobile.

I encourage you to think about ways we can reduce our test requirements while allowing for greater flexibility.

2 Comments

Filed under general, testdev