Tag Archives: talos

Are there any trends in our Talos regression bugs?

Now that we have a better process for taking action on Talos alerts and pushing them to resolution, it is time to take a step back and see if any trends show up in our bugs.

First I want to look at bugs filed/week:

Image

This is fun to see, now what if we stack this up side by side with the alerts we receive:

Image

We started tracking alerts halfway through this process.  We show that for about 1 out of every 25 alerts we file a bug.  I had previously stated it was closer to 1/33 alerts (it appears that is averaging out the first few weeks).

Lets see where these bugs are filed, here is a view of the different bugzilla products:

Image

The Testing product is used to file bugs that we cannot figure out the exact changeset, so they get filed in testing::talos.  As there are almost 30 unique components bugs are filed in, I took a few minutes to look at the Core product, here is where the bugs live in Core:

Image

Pardon my bad graphing attempt here with the components cut off.  Graphics is the clear winner for regressions (with “graphics: layers” being a large part of it).  Of course the Javascript Engine and DOM would be there (a lot of our tests are sensitive to changes here).  This really shows where our test coverage is more than where bad code lives. 

Now that I know where the bugs are, here is a view of how long the bugs stay open:

Image

The fantastic news is most of our bugs are resolved in <=15 days!  I think this is a metric we can track and get better at- ideally closing all Talos regression bugs in <30 days.

Looking over all the bugs we have, what is the status of them?

Image

Yay for the blue pacman!  We have a lot of new bugs instead of assigned bugs, that might be something we could adjust and assign owners once it is confirmed and briefly discussed- that is still up in the air.

The burning question is what are all the bugs resolved as?

Image

To me this seems healthy, it is a starting point.  Tracking this over time will probably be a useful metric!

 

In summary, many developers have done great work to make improvements and fix patches over the last 6 months that we have been tracking this information.  There are things we can do better, I want to know-

What information provided today is useful to track regularly?

Is there something you would rather see?

 

2 Comments

Filed under Uncategorized

The lifecycle of a Talos performance regression

The lifecycle of a Talos performance regression

The cycle of landing a change to Firefox that affects performance

Leave a comment

May 8, 2014 · 9:38 am

Hey, SessionRestore- you have a Talos test

As of last Friday bug 936630 landed so we now have sessionrestore (and sessionrestore_no_auto_restore) as 2 new tests in the Talos suite (posting results under they magic ‘o’ key on tbpl).  In fact we have already seen this test show improvements.

Thanks to Yoric for creating these new tests, give him some karma on irc!  Please refer to the talos docs if you want more information on these tests.

 

Leave a comment

Filed under Uncategorized

Performance Alerts – by the numbers

If you have ever received an automated mail about a performance regression, and then 10 more, you probably are frustrated by the volume of alerts.  6 months ago, I started looking at the alerts and filing bugs, and 10 weeks ago a little tool was written to help out.

What have I seen in 10 weeks:

1926 alerts on mozilla.dev.tree-management for Talos resulting in 58 bugs filed (or 1 bug/33 alerts):

Image

*keep in mind that many alerts are improvements, as well as duplicated between trees and pgo/nonpgo

 

Now for some numbers as we uplift.  How are we doing from one release to another?  Are we regressing, Improving?  These are all questions I would like to answer in the coming weeks.

Firefox 30 uplift, m-c -> Aurora:

  • 26 – regressions (4 TART, 4 SVG, 3 TS, Paint, and many more)
    • 2 remaining bugs not resolved as we are now on Beta (bug 990183, bug 990194)

 

Firefox 31 uplift, m-c -> Aurora (tracking bug 990085):

 

Is this useful information?

Are there questions you would rather I answer with this data?

 

3 Comments

Filed under Uncategorized

Performance Bugs – How to stay on top of Talos regressions

Talos is the framework used for desktop Firefox to measure performance for every patch that gets checked in.  Running tests for every checkin on every platform is great, but who looks at the results?

As I mentioned in a previous blog post, I have been looking at the alerts which are posted to dev.tree-management, and taking action on them if necessary.  I will save discussing my alert manager tool for another day.  One great thing about our alert system is that we send an email to the original patch author if we can determine who it is.  What is great is many developers already take note of this and take actions on their own.  I see many patches backed out or discussed with no one but the developer initiating the action.

So why do we need a Talos alert sheriff?  For the main reason that not even half of the regressions are acted upon.  There are valid reasons for this (wrong patch identified, noisy data, doesn’t seem related to the patch) and of course many regressions are ignored due to lack of time.  When I started filing bugs 6 months ago, I incorrectly assumed all of them would be fixed or resolved as wontfix for a valid reason.  This happens for most of the bugs, but many regressions get forgotten about.

When we did the uplift of Firefox 30 from mozilla-central to mozilla-aurora, we saw 26 regression alerts come in and 4 improvement alerts.  This prompted us to revisit the process of what we were doing and what could be done better.  Here are some of the new things we will be doing:

  • For all regressions found, attempt to find the original bug and reopen/comment in the bug
  • For some regressions that it is not easy to find the original bug, we will open a new bug
  • All bugs that have regression information will be marked as blocking a new tracking bug
  • For each release we will create a new tracking bug for all regressions
  • After an uplift from central->aurora, we will ensure we have all alerts mapped to existing regressions

As this process goes through a cycle or two, we will refine it to ensure we have less noise for developers and more accuracy in tracking regressions faster

 

Leave a comment

Filed under Uncategorized

tracking talos alerts across branches

A year without blogging and I am back.  I figured there was some cool stuff to share, here is one tidbit.

In the last year I have picked up looking at talos results and filing regression bugs for results.  This has been useful.  What currently happens is when results are submitted to g.m.o (graph server) we detect a regression and send out an email to the original patch author (if we can determine it) and post to mozilla.dev.tree-management.  I have been using dev.tree-management as a starting point for my hunting regressions.  When things are busy it can eat up a couple hours in a day.  Luckily many developers are responsible in taking action when they receive the emails.

Given that at least half of the regressions are not acted upon by the original developer, it is important to read the newsgroup. One of the things which makes it frustrating is that for a single regression we can get multiple alerts (regular builds vs pgo builds and as the patch merges between branches/projects).

To make my life easier, I have taken all the alerts on dev.tree-management and put them in a database (local right now).  The final goal is a webUI that lets me easily annotate these alerts similar to tbpl for random test failures.  One thing I wanted to do was help identify duplicate alerts.  Today in my attempt I had a clear picture of what the lifecycle of a regression looks like:

mysql> select date,branch,percent,keyrevision from alerts where test=’Paint’ and platform=’WINNT 6.2 x64′ order by date ASC;
+———————+————————-+———+————–+
| date                | branch                  | percent | keyrevision  |
+———————+————————-+———+————–+
| 2014-02-14 19:41:38 | Mozilla-Inbound-Non-PGO | 10.1%   | c7802c9d6eec |
| 2014-02-15 01:03:54 | Fx-Team-Non-PGO         | 9.53%   | 7a3adc5aac28 |
| 2014-02-15 21:43:48 | Mozilla-Inbound         | 10.6%   | c7802c9d6eec |
| 2014-02-16 03:46:12 | Firefox-Non-PGO         | 8.88%   | 5d7caa093f4f |
| 2014-02-16 03:46:13 | B2g-Inbound-Non-PGO     | 9.44%   | 071885f79841 |
| 2014-02-16 14:22:38 | Fx-Team                 | 10.4%   | 7a3adc5aac28 |
| 2014-02-17 04:42:57 | B2g-Inbound             | 10.7%   | 071885f79841 |
| 2014-02-18 11:43:54 | Firefox                 | 9.76%   | eac89fb04bb9 |
+———————+————————-+———+————–+
8 rows in set (0.00 sec)

This is really cool to see how 1 change can generate alerts for 4 days.

Stay tuned for more information on this and other topics!

Leave a comment

Filed under Uncategorized

Android automation is becoming more stable ~7% failure rate

At Mozilla we have made our unit testing on android devices to be as important as desktop testing. Earlier today I was asked how do we measure this and what is our definition of success. The obvious answer is no failures except for code that breaks a test, but reality is something where we allow for random failures and infrastructure failures. Our current goal is 5%

So what are these acceptable failures and what does 5% really mean. Failures can happen when we have tests which fail randomly, usually poorly written tests or tests which have been written a long time ago and hacked to work in todays environment. This doesn’t mean any test that fails is a problem, it could be a previous test that changes a Firefox preference on accident. For Android testing, this currently means the browser failed to launch and load the test webpage properly or it crashed in the middle of the test. Other failures are the device losing connectivity, our host machine having hiccups, the network going down, sdcard failures, and many other problems. With our current state of testing this mostly falls into the category of losing connectivity to the device. For infrastructure problems they are indicated as Red or Purple and for test related problems they are Orange.

I took at a look at the last 10 runs on mozilla-central (where we build Firefox nightlies from) and built this little graph:

Firefox Android Failures

Firefox Android Failures

Here you can see that our tests are causing 6.67% of the failures and 12.33% of the time we can expect a failure on Android.

We have another branch called mozilla-inbound (we merge this into mozilla-central regularly) where most of the latest changes get checked in.  I did the same thing here:

mozilla-inbound Android Failures

mozilla-inbound Android Failures

Here you can see that our tests are causing 7.77% of the failures and 9.89% of the time we can expect a failure on Android.

This is only a small sample of the tests, but it should give you a good idea of where we are.

3 Comments

Filed under testdev

Some notes about adding new tests to talos

Over the last year and a half I have been editing the talos harness for various bug fixes, but just recently I have needed to dive in and add new tests and pagesets to talos for Firefox and Fennec.  Here are some of the things I didn’t realize or have inconveniently forget about what goes on behind the scenes.

  • tp4 is really 4 tests: tp4, tp4_nochrome, tp4_shutdown, tp4_shutdown_nochrome.  This is because in the .config file, we have “shutdown: true” which adds _shutdown to the test name and running with –noChrome adds the _nochrome to the test name.  Same with any test that us run with the shutdown=true and nochrome options.
  • when adding new tests, we need to add the test information to the graph server (staging and production).  This is done in the hg.mozilla.org/graphs repository by adding to data.sql.
  • when adding new pagesets (as I did for tp4 mobile), we need to provide a .zip of the pages and the pageloader manifest to release engineering as well as modifying the .config file in talos to point to the new manifest file.  see bug 648307
  • Also when adding new pages, we need to add sql for each page we load.  This is also in the graphs repository bug in pages_table.sql.
  • When editing the graph server, you need to file a bug with IT to update the live servers and attach a sql file (not a diff).   Some examples: bug 649774 and bug 650879
  • after you have the graph servers updated, staging run green, review done, then you can check in the patch for talos
  • For new tests, you also have to create a buildbot config patch to add the testname to the list of tests that are run for talos
  • the last step is to file a release engineering bug to update talos on the production servers.  This is done by creating a .zip of talos, posting it on a ftp site somewhere and providing a link to it in the bug.
  • one last thing is to make sure the bug to update talos has an owner and is looked at, otherwise it can sit for weeks with no action!!!

This is my experience from getting ts_paint, tpaint, and tp4m (mobile only) tests added to Talos over the last couple months.

Leave a comment

Filed under testdev

Talos, Remote Testing and Android

Last week I posted about mochikit.jar and what was done to enable testing on remote systems (specifically Android) for mochitest chrome style tests.  This post will discuss the work done to Talos for remote testing on Android.  I have been working with bear in release engineering a lot to flush out and bugs.  Now we are really close to turning this stuff on for the public facing tinderbox builds.

Talos + Remote Testing:

Last year, I had adding all the remote testing bits into Talos for windows mobile.  Luckily this time around I just had to clean up a few odds and ends (adding support for IPC).  Talos is setup to access a webserver and communicate with a SUTAgent (when you setup your .config file properly.)  This means you can have a static webserver on your desktop or the network and run talos from any SUTagent and a host machine.

Talos + Android:

This is a harder challenge to resolve than remote testing.  Android does not support redirecting to stdout which talos required.  For talos and all related tests (fennecmark, pageloader) we need to write to a log file from the test itself.

Run it for yourself:

Those are the core changes that needed to be made.  Here are some instructions for running it on your own:

hg clone http://hg.mozilla.org/build/talos

ln -s talos/ /var/www/talos #create a link on http://localhost/talos to the hg clone

python remotePerfConfigurator.py -v -e org.mozilla.fennec -t `hostname` -b mobile-browser –activeTests ts:tgfx –sampleConfig remote.config –browserWait 60 –noChrome –output test.config –remoteDevice 192.168.1.115 –webServer 192.168.1.102/talos

python run_tests.py test.config

* NOTE: 192.168.1.115 is the address of my android device (SUTAgent), and 192.168.1.102 is my webserver on my desktop

Leave a comment

Filed under testdev

status of the winmo automated tests project

I have been posting about this project for a while, so I figured I should give an update. Currently patches are landing and we are starting to get the final set of patches ready for review.

  • Talos: This was the first part of this project and we have checked in 3 of the 4 patches to get Talos TS running. There is 1 patch remaining which I need to upload for review
  • Mochitest: There are 4 patches required for this to work:
    1. Fix tests to not use hardcoded localhost – early review stages
    2. Add CLI options to mochitest for remote webserver – I need to cleanup my patch for review, at the end game
    3. Add devicemanager.py to the source tree – review started, waiting on sutagent.exe to resolve a few minor bugs
    4. Add runtestsremote.py to the source tree – review process started, waiting on other patches

    Good news is all 4 patches are at the review stage

  • Reftest: This requires 4 patches (1 is devicemanager.py from mochitest)
    1. Modify reftest.jar to support http url for manifest and test files – up for review
    2. Refactor runreftests.py – up for review
    3. Add remotereftests.py to source tree – needs work before review, but WIP posted

    Keep in mind here we are still blocked on registering the reftest extension. I also have instructions for how to setup and run this.

  • Xpcshell: this requires 3 patches (1 is device manager) and is still in WIP stages. There are two pieces to this that we still need to resolve: copying over the xpcshell data to the device and setting up a webserver to serve pages. Here are the two patches to date:
    1. Refactor runxpcshelltests.py to support subclass for winmo – WIP patch posted, close to review stage
    2. Add remotexpcshelltests.py to source tree – WIP patch posted

    I have written some instructions on how to run xpcshell tests on winmo if you are interested.

Stay tuned for updates when we start getting these patches landed and resolving some of our device selection/setup process.

4 Comments

Filed under testdev