Tag Archives: mobile

Android automation is becoming more stable ~7% failure rate

At Mozilla we have made our unit testing on android devices to be as important as desktop testing. Earlier today I was asked how do we measure this and what is our definition of success. The obvious answer is no failures except for code that breaks a test, but reality is something where we allow for random failures and infrastructure failures. Our current goal is 5%

So what are these acceptable failures and what does 5% really mean. Failures can happen when we have tests which fail randomly, usually poorly written tests or tests which have been written a long time ago and hacked to work in todays environment. This doesn’t mean any test that fails is a problem, it could be a previous test that changes a Firefox preference on accident. For Android testing, this currently means the browser failed to launch and load the test webpage properly or it crashed in the middle of the test. Other failures are the device losing connectivity, our host machine having hiccups, the network going down, sdcard failures, and many other problems. With our current state of testing this mostly falls into the category of losing connectivity to the device. For infrastructure problems they are indicated as Red or Purple and for test related problems they are Orange.

I took at a look at the last 10 runs on mozilla-central (where we build Firefox nightlies from) and built this little graph:

Firefox Android Failures

Firefox Android Failures

Here you can see that our tests are causing 6.67% of the failures and 12.33% of the time we can expect a failure on Android.

We have another branch called mozilla-inbound (we merge this into mozilla-central regularly) where most of the latest changes get checked in.  I did the same thing here:

mozilla-inbound Android Failures

mozilla-inbound Android Failures

Here you can see that our tests are causing 7.77% of the failures and 9.89% of the time we can expect a failure on Android.

This is only a small sample of the tests, but it should give you a good idea of where we are.

3 Comments

Filed under testdev

Talos, Remote Testing and Android

Last week I posted about mochikit.jar and what was done to enable testing on remote systems (specifically Android) for mochitest chrome style tests.  This post will discuss the work done to Talos for remote testing on Android.  I have been working with bear in release engineering a lot to flush out and bugs.  Now we are really close to turning this stuff on for the public facing tinderbox builds.

Talos + Remote Testing:

Last year, I had adding all the remote testing bits into Talos for windows mobile.  Luckily this time around I just had to clean up a few odds and ends (adding support for IPC).  Talos is setup to access a webserver and communicate with a SUTAgent (when you setup your .config file properly.)  This means you can have a static webserver on your desktop or the network and run talos from any SUTagent and a host machine.

Talos + Android:

This is a harder challenge to resolve than remote testing.  Android does not support redirecting to stdout which talos required.  For talos and all related tests (fennecmark, pageloader) we need to write to a log file from the test itself.

Run it for yourself:

Those are the core changes that needed to be made.  Here are some instructions for running it on your own:

hg clone http://hg.mozilla.org/build/talos

ln -s talos/ /var/www/talos #create a link on http://localhost/talos to the hg clone

python remotePerfConfigurator.py -v -e org.mozilla.fennec -t `hostname` -b mobile-browser –activeTests ts:tgfx –sampleConfig remote.config –browserWait 60 –noChrome –output test.config –remoteDevice 192.168.1.115 –webServer 192.168.1.102/talos

python run_tests.py test.config

* NOTE: 192.168.1.115 is the address of my android device (SUTAgent), and 192.168.1.102 is my webserver on my desktop

Leave a comment

Filed under testdev

updated status of the winmo automation project

A couple weeks ago, I posted an initial status of the winmo automation project. Here is an update of where we are.

Only a few patches have landed since the last update, but a lot of reviews have happened. Great progress has been made to resolve some of the unknown issues with running xpcshell and reftest remotely and everything is truly up for review. I expect next week at this time to have a much shorter list of patches.

1 Comment

Filed under Uncategorized

More info on the localhost->mochi.test change

This is just another public service announcement about the upcoming change to the unit tests. Last week I wrong about ‘mochi.test is the new localhost’, and this week I am reiterating that with more details.

After some feedback on the bug for this, we now have these options available:
* mochi.test
* 127.0.0.1
* moztest

With these options for referencing data in a test, you should be able to cover all the different networking scenarios that need testing. Keep in mind that mochi.test is the primary host and if you are posting a message to the parent window, the default domain is mochi.test, not localhost.

There is one other scenario where you need to get the real ip address of the server and this can be done with code like this:
var ios = Cc["@mozilla.org/network/io-service;1"].getService(Components.interfaces.nsIIOService);
var pps = Cc["@mozilla.org/network/protocol-proxy-service;1"].getService();
var uri = ios.newURI(“http://example.com”, null, null);
var pi = pps.resolve(uri, 0);
var host = “http://” + pi.host + “:” + pi.port;

From all the tests, test_prompt_async is the only test that needs this type of solution.

When we start running tests on windows mobile using the remote webserver, there won’t be any other changes needed other than what is mentioned above. Look for a demo from Clint at an upcoming Monday meeting.

Leave a comment

Filed under qa, testdev

status of the winmo automated tests project

I have been posting about this project for a while, so I figured I should give an update. Currently patches are landing and we are starting to get the final set of patches ready for review.

  • Talos: This was the first part of this project and we have checked in 3 of the 4 patches to get Talos TS running. There is 1 patch remaining which I need to upload for review
  • Mochitest: There are 4 patches required for this to work:
    1. Fix tests to not use hardcoded localhost – early review stages
    2. Add CLI options to mochitest for remote webserver – I need to cleanup my patch for review, at the end game
    3. Add devicemanager.py to the source tree – review started, waiting on sutagent.exe to resolve a few minor bugs
    4. Add runtestsremote.py to the source tree – review process started, waiting on other patches

    Good news is all 4 patches are at the review stage

  • Reftest: This requires 4 patches (1 is devicemanager.py from mochitest)
    1. Modify reftest.jar to support http url for manifest and test files – up for review
    2. Refactor runreftests.py – up for review
    3. Add remotereftests.py to source tree – needs work before review, but WIP posted

    Keep in mind here we are still blocked on registering the reftest extension. I also have instructions for how to setup and run this.

  • Xpcshell: this requires 3 patches (1 is device manager) and is still in WIP stages. There are two pieces to this that we still need to resolve: copying over the xpcshell data to the device and setting up a webserver to serve pages. Here are the two patches to date:
    1. Refactor runxpcshelltests.py to support subclass for winmo – WIP patch posted, close to review stage
    2. Add remotexpcshelltests.py to source tree – WIP patch posted

    I have written some instructions on how to run xpcshell tests on winmo if you are interested.

Stay tuned for updates when we start getting these patches landed and resolving some of our device selection/setup process.

4 Comments

Filed under testdev

first round of new test harness code has landed

Back in October, I started working on code to run the unittests on Windows CE + Mobile. This is an ongoing project, but I am starting to get the ball rolling in the right direction finally.

Today I checked in my first (actually a set of 2) patch (and it didn’t get backed out this time) which converts the bulk of the python test harness code to be OO instead of a raw scripts.

This is sort of a halfway point in the code that needs to get checked into mozilla-central in order for us to be testing automation on a windows mobile phone. Big thanks to Ted for reviewing all my patches and to Clint for helping me test and do the actual checkin.

NOTE: I originally wrote this Jan 7th, and it finally made in it today:)

1 Comment

Filed under testdev

First look at the new remote testing

It has been almost 2 months since my last post and I have been heads down on bringing a new framework to life.

I discussed the 4 pieces involved in bringing automation to a new platform, and now we have what is shaping up to be a great approach for resolving the infrastructure and harness development.

What we have is a device specific agent which is written in a native language (C/C++) and does a small number of things (launch process, collect output, copy files to and from, query status such as processes, memory, cpu, disk, and lastly identify the device and os). This tool needs to act as a telnet server allowing us to telnet to it and execute a series of commands. Brad Lassey has developed such a tool for WinMo/CE which works partially on Win32 as well. I spent a couple days testing this interface and hammering out a python library to interact with a remote device. Now scripting file copy and process launching is easy. Clint has also used this tool to get mochitest and xpcshell running on Windows CE based devices!

That takes care of most of our infrastructure, now we need to get this stuff working on our test harnesses. My first task (only cleaned up one atm) is talos (see attached patch). This required massive changes to the Talos codebase, but no changes functionally for desktop based talos.

I found that in adding support to Talos for a remote device (as well as initial work for xpcshell and mochitest) that once we develop a DeviceAgent for a given platform there will be almost no additional work required to start running tests! I might be out of a job!!!

Next week I am going to work with Aki to get talos up and running (trial runs…don’t expect true miracles) on winmo and reporting to a staging graph server. Following that, we will start cleaning up our other harness code and getting mochitest, xpcshell, and reftests underway.

Stay tuned for progress updates and more details about a much needed updating to the way automation is run!

5 Comments

Filed under testdev

Making mobile automation development more efficient

In a recent discussion with ctalbert, we discussed what is the right course to take for getting automation running on windows mobile and how we could avoid this 6 month nightmare for the next platform. The big question we want to answer is how can we implement our automation for talos and unittests on Windows Mobile and not have to reinvent the wheel for the next platform?

Our current approach to date has been to take what we have for Firefox (which works really good) and adapt it to work on a mobile device. This approach has not been very successful in the development of Fennec (functionality, UI, performance) nor the automation of unittests and talos.

After further thought on this subject, I find there are 4 area to focus on:

  1. Infrastructure and Tools
  2. Porting Harnesses and Tests
  3. Managing Automation and Failures
  4. Mobile Specific Automation

Each of these 4 areas are tightly coupled yet require a much different solution than the others. Let me outline each area in a bit more detail describing the problem, our solution, and a longer term solution.

Infrastructure and Tools:

This is the area of all the assumptions. The OS you run on, network connectivity, available disk space for test and log files, tools to manage your environment and processes and a method for doing this all automatically. I am going to leave out the build and reporting infrastructure from this topic as those take place separately and don’t run on the device.

Our first port of this to maemo was not as difficult because we could use python to manage files, network, and processes just as we do on a desktop OS. Minor adjustments had to be made for running on storage cards, using different python libraries (we have a limited set available on maemo) and system tools, as well as changing requirements for process names and directory structures. Also maemo has ssh functionality, a cli terminal, and a full set of command line tools to manage stuff.

Moving onto Windows Mobile, we don’t have the tools and infrastructure like we do on Maemo. This means we need to spend a lot of time hacking our communications required for automation and scripting tools like python. Almost all process interaction (create, query, kill), need custom code to take care of it. This has presented us with a problem where we don’t have the luxury of our OS support and tool support. Our approach to date has been to write (or in the case of python, port) our tools to make them work on the device. Unfortunately after 4 months we don’t have a working set of automation that people are happy with.

How can we create infrastructure that is scalable to all platforms? From what I have seen, we need to move away from our reliance on all tools on the device. This means no python or web server on the device, no test data on the device, and assume we won’t be able to store log files or use common system tools. I would like to see a custom communication layer for each OS. So for Windows Mobile, we would create a server that lives on the device which: ensures we have ip connectivity, provides file system tools, process management tools, and allows for the OS to reboot and come back connected. The other half of this is a job server which sends commands to the device and serves/collects test data via a web interface. I know this is a big step from where we are now, but in the future it seems like the easiest approach.

Porting Harnesses and Tests:

This focus area is more about making sure the environment is setup correctly, the right tests are run and useful logs are created. This can be developed without a full infrastructure in place, but it really requires some knowledge about the infrastructure.

For Maemo, a lot of work was done to extract the unittests from the source tree and retrofit the tools and harnesses to manage tests in “chunks” rather than running them all at once. We have done a lot of work to clean up tests that assume preferences or ones that look for features.

The challenge on Windows Mobile is without an infrastructure the tests rely on we need to do things differently. Very few bugs were found that prevented tests from running while porting tests to Maemo. For WinMo, that is a different story. We cannot run the web server locally, cannot load our mochitests in an iframe, and have trouble creating log files. All of these issues force us to morph our testing even further away from where it was and realize that we need to do this differently.

What I see as the ultimate solution here is to setup a “test server” which serves all our test data. Each test would remove the dependencies it has on localhost and work with a test server given that it has an IP address. We would then have an extension for Firefox/Fennec which would serve as the test driver to setup, run, and report the test results. This would allow for any mobile device (that supports addons), desktop, or community member to install the addon and start testing.

Managing Automation and Failures:

This is a much smaller topic than the previous two, but once we do have data generated, how do we keep generating it reliably and use it to our advantage?

Right now our toolset of Tinderbox and Buildbot do a great job for Firefox. I believe that we can still utilize them for mobile. There might be specific issues with them, but for the purposes of running Talos and Unittests, we need something that will take a given build, send it to a machine, run tests, and figure out when it is done. We even have great tools to notify us via IRC when a test suite fails.

The danger here is when testing on a new device or product, we find hundreds if not thousands of failures. The time required to track those down and file bugs is a big job by itself. When you have a large number of bugs waiting to be fixed it won’t happen in the same week (or quarter). This brings up a problem where nobody pays attention to the reported data because there are always so many failures.

The other problem that occurs is with so many crashes and running tests in smaller chunks or 1 by 1, we end up with smaller log files and lose the ability to get the pass/fail status that our test harnesses for Firefox rely upon. I know simply looking for a TEST-UNEXPECTED string in *.log is a reasonable solution, but as we have learned there are a lot of corner cases and that doesn’t tell you which tests were not run.

How can we make this better? Our first step to solving this problem is LogCompare. This is a log parser that uploads data to a database and lets us compare build by build. This solves the problem of finding new failures and ignoring known failures if we want to. A final solution to this would be to expand on this idea and have test runners (via the addon) upload test result blobs to a database. Adding more tools to query status of a test run, missed tests, etc… can be done giving us more detailed reports than just pass/fail. In the long term a tool like this can be used to look at random orange results as well as getting status of many community members pitching in CPU time.

Mobile Specific Automation:

The last piece of the puzzle is to create specific test cases to exercise mobile features. This is fairly trivial and great work has already been done for bookmarks and tabs using browser-chrome. This is important as the more developers we have working on mobile and the more branches we have, the greater the frequency of regressions.

Here is the problem, the mobile UI is changing so rapidly that it would take a full time job to keep up with the automation. This is assuming you have comprehensive tests written. It is actually faster to install a build and poke around the UI for 10 minutes than it is to keep the tests maintained. I know in reality there will be moving pieces on a regular basis, but right now we are ripping big pieces out and rewriting everything else. As a reference point the tab functionality has changed 4 times in the last year.

Looking at the regressions found in the last couple weeks we would not have found those with automation. There is a great list of stuff we want to automate in the Fennec UI and almost none of those would have failed as the result of a recent regression. This means we need many more tests than a few hundred core functionality tests. It also points out that we are not going to catch everything even if we all agree that our tests are comprehensive.

What is the best way to utilize automation? Until a 1.0 release, we have to expect that things will be very volatile. We should fix the automation we currently have and require developers to run it on their desktop or device before checking in. This should be a 1.0 requirement. If a developer is changing functionality, fix the tests. Why this works is we don’t have a lot of tests right now. This will serve more of a purpose to fix the process than finding bugs. Post 1.0, lets build up the automation to have decent coverage of all the major areas (pan, zoom, tabs, controls, awesomebar, bookmarks, downloads, addons, navigation), and keep the requirement that these tests need to run for each patch we checkin. The time to run a test will be fast on a desktop <5 minutes.

Summary:

While we seem to be flopping around like a fish out of water, we just need some clear focus and agreement from all parties about the direction and we can have a great solution. My goal is to be forward looking and not dwell on the existing techniques that work, yet are still being improved. After looking at this from a future point of view I see that developing a great solution to meet our needs now can also allow for greater community involvement leading to greater test coverage for Fennec and Firefox. The amount of work required to generalize this is equivalent to the same work for a specialized solution for Windows Mobile.

I encourage you to think about ways we can reduce our test requirements while allowing for greater flexibility.

2 Comments

Filed under general, testdev

More details on how to run mochitests remotely

After having good success two weeks ago running mochitests on my windows mobile device using a remote web server, I have cleaned up a lot of my code and made this a better process. When I last posted, I had this remaining list of action items and I have appended my status:
* Sort out python script to generate mochitesttestingprofile and get it on the device- bug 512319
* Fix profile and tests to remove localhost/127.0.0.1 dependencies- bug 512319
* Fix tests to remove calls to local files (an example I found)- about 100 test files fail
* Test on a release build of Fennec with desktop tests.tar- more details below
* Verify tools like certutil.exe, ssltunnel.exe, etc.. do not cause any problems- no progress
* Write tools in the python script to look for a test that doesn’t exit and clean up zombie processes- fixed with maemkit

I have yet to update maemkit officially, but that is in the works. I mentioned there is a quirk with running on release build and tests.tar. The issue with this is to make sure you have the right build and binaries for the right platform. I know this sounds easy, but in order for me to run tests on windows mobile, I need to build a binary of windows mobile and a test package for desktop.

Let me outline a set of steps that are necessary to take to help elaborate on this:
1) Build WinMo build and install on device (I usually take the .zip file, unzip, and copy to \tests\ so that I can run \tests\fennec\fennec)
2) Build Windows Desktop build (with my two patches 508664 512319) and create a ‘make package-tests’ and untar this in something like c:\tests (so you have c:\tests\bin, c:\tests\mochitest\, etc…).
3) Using the build from step #2, create a ‘make package’ and unzip the package to c:\tests so you have c:\tests\firefox\firefox.exe.
4) Copy c:\tests\bin\* c:\tests\firefox so we have the xpcshell.exe in the correct directory
5) Run: python runtests.py --appname=firefox.exe --remote-webserver=192.168.55.100:8888 --setup-only. Note the ip address is the activesync ip
6) Create profile directory on device: c:\tools\pmkdir.exe \\tests\\mochitesttestingprofile\\
7) Copy mochitesttestingprofile to device: c:\tools\pput.exe -r c:\tests\mochitest\mochitesttestingprofile\* \tests\mochitesttestingprofile\
7) Edit httpd.js and server-locations.js in c:\tests\mochitest to change localhost and 127.0.0.1 to be 192.168.55.100
8) Launch web server (from the c:\tests directory):
firefox\xpcshell.exe -g firefox -v 170 -f mochitest\httpd.js -f mochitest\server.js
9) launch fennec on remote device:

c:\tools\prun.exe -w \tests\fennec\fennec.exe --environ:NO_EM_RESTART=1 -no-remote -profile \tests\mochitesttestingprofile\ http://192.168.55.100:8888/tests/toolkit/components/passwordmgr/test/test_xhr.html?logFile=%5ctests%5cmochi.log

That is the basic run. When I have maemkit updated, step 9 would become:
python maemkit-chunked.py --testtype=mochitest

I can automate a lot of these steps if I assume we are running over active sync and make maemkit a bit smarter about the setup.

1 Comment

Filed under testdev

joel’s rage – dom tests on windows mobile

Previously I had mentioned getting windows mobile mochitests up and running. Since then they have been running mostly full time. I calculated out that it would take about 50 hours to run the full test suite assuming there were no errors that caused the browser to not exit automatically. Unfortunately I ran into a block of 234 tests files which over 175 of them fail to execute (no log generated and browser doesn’t terminate within 5 minutes).

These are all dom-level1-core/test_hc_* tests. In trying to debug the problem (by editing a testcase), I found that I could pinpoint an issue, but then it wouldn’t pass twice in a row. Further tinkering showed that about 1 out of 4 times I ran a test (with/without changing it) it would pass. Looking at my log files that I generated during my initial automation pass on the dom-level1-core directory I see the same statistics (1 out of 4 passing) for the fully automated run. Time for a debug build to figure out what is going on.

With a debug build installed, every time I ran a test manually it would pass. The same test that would not pass two times in a row passed 6 times in a row. So much for a debug build helping me out. Next I thought that this might be related to running on a remote web server and one file at a time. I ran a series of tests on desktop Linux Fennec and they all ran end to end without hanging.

This is where I get stuck. My infrastructure is not the problem as the tests obviously run. The problem is they only run *reliably* on debug builds. This all points to some kind of timing issue with WinMo Fennec. Any tips for how to figure out what the problem is?

Leave a comment

Filed under testdev