Tag Archives: talos

Watching the watcher – Some data on the Talos alerts we generate

What are the performance regressions at Mozilla- who monitors them and what kind of regressions do we see?  I want to answer this question with a few peeks at the data.  There are plenty of previous blog posts I have done outlining stats, trends, and the process.  Lets recap what we do briefly, then look at the breakdown of alerts (not necessarily bugs).

When Talos uploads numbers to graph server they get stored and eventually run through a calculation loop to find regressions and improvements.  As of Jan 1, 2015, we upload these to mozilla.dev.tree-alerts as well as email to the offending patch author (if they can easily be identified).  There are a couple folks (performance sheriffs) who look at the alerts and triage them.  If necessary a bug is filed for further investigation.  Reading this brief recap of what happens to our performance numbers probably doesn’t inspire folks, what is interesting is looking at the actual data we have.

Lets start with some basic facts about alerts in the last 12 months:

  • We have collected 8232 alerts!
  • 4213 of those alerts are regressions (the rest are improvements)
  • 3780 of those above alerts have a manually marked status
    • the rest have been programatically marked as merged and associated with the original
  • 278 bugs have been filed (or 17 alerts/bug)
    • 89 fixed!
    • 61 open!
    • 128 (5 invalid, 8 duplicate, 115 wontfix/worksforme)

As you can see this is not a casual hobby, it is a real system helping out in fixing and understanding hundreds of performance issues.

We generate alerts on a variety of branches, here is the breakdown of branches and alerts/branch;

number of regression alerts we have received per branch

number of regression alerts we have received per branch

There are a few things to keep in mind here, mobile/mozilla-central/Firefox are the same branch, and for non-pgo branches that is only linux/windows/android, not osx. 

Looking at that graph is sort of non inspiring, most of the alerts will land on fx-team and mozilla-inbound, then show up on the other branches as we merge code.  We run more tests/platforms and land/backout stuff more frequently on mozilla-inbound and fx-team, this is why we have a larger number of alerts.

Given the fact we have so many alerts and have manually triaged them, what state the the alerts end up in?

Current state of alerts

Current state of alerts

The interesting data point here is that 43% of our alerts are duplicates.  A few reasons for this:

  • we see an alert on non-pgo, then on pgo (we usually mark the pgo ones as duplicates)
  • we see an alert on mozilla-inbound, then the same alert shows up on fx-team,b2g-inbound,firefox (due to merging)
    • and then later we see the pgo versions on the merged branches
  • sometimes we retrigger or backfill to find the root cause, this generates a new alert many times
  • in a few cases we have landed/backed out/landed a patch and we end up with duplicate sets of alerts

The last piece of information that I would like to share is the break down of alerts per test:

Alerts per test

number of alerts per test (some excluded)

There are a few outliers, but we need to keep in mind that active work was being done in certain areas which would explain a lot of alerts for a given test.  There are 35 different test types which wouldn’t look good in an image, so I have excluded retired tests, counters, startup tests, and android tests.

Personally, I am looking forward to the next year as we transition some tools and do some hacking on the reporting, alert generation and overall process.  Thanks for reading!

1 Comment

Filed under testdev

A case of the weekends?

Case of the Mondays

What was famous 15 years ago as a case of the Mondays has manifested itself in Talos.  In fact, I wonder why I get so many regression alerts on Monday as compared to other days.  It is more to a point of we have less noise in our Talos data on weekends.

Take for example the test case tresize:


* in fact we see this on other platforms as well linux32/linux64/osx10.8/windowsXP

30 days of linux tresize

Many other tests exhibit this.  What is different about weekends?  Is there just less data points?

I do know our volume of tests go down on weekends mostly as a side effect of less patches being landed on our trees.

Here are some ideas I have to debug this more:

  • Run massive retrigger scripts for talos on weekends to validate # of samples is/isnot the problem
  • Reduce the volume of talos on weekdays to validate the overall system load in the datacenter is/isnot the problem
  • compare the load of the machines with all branches and wait times to that of the noise we have in certain tests/platforms
  • Look at platforms like windows 7, windows 8, and osx 10.6 as to why they have more noise on weekends or are more stable.  Finding the delta in platforms would help provide answers

If you have ideas on how to uncover this mystery, please speak up.  I would be happy to have this gone and make any automated alerts more useful!


Filed under testdev

Are there any trends in our Talos regression bugs?

Now that we have a better process for taking action on Talos alerts and pushing them to resolution, it is time to take a step back and see if any trends show up in our bugs.

First I want to look at bugs filed/week:


This is fun to see, now what if we stack this up side by side with the alerts we receive:


We started tracking alerts halfway through this process.  We show that for about 1 out of every 25 alerts we file a bug.  I had previously stated it was closer to 1/33 alerts (it appears that is averaging out the first few weeks).

Lets see where these bugs are filed, here is a view of the different bugzilla products:


The Testing product is used to file bugs that we cannot figure out the exact changeset, so they get filed in testing::talos.  As there are almost 30 unique components bugs are filed in, I took a few minutes to look at the Core product, here is where the bugs live in Core:


Pardon my bad graphing attempt here with the components cut off.  Graphics is the clear winner for regressions (with “graphics: layers” being a large part of it).  Of course the Javascript Engine and DOM would be there (a lot of our tests are sensitive to changes here).  This really shows where our test coverage is more than where bad code lives. 

Now that I know where the bugs are, here is a view of how long the bugs stay open:


The fantastic news is most of our bugs are resolved in <=15 days!  I think this is a metric we can track and get better at- ideally closing all Talos regression bugs in <30 days.

Looking over all the bugs we have, what is the status of them?


Yay for the blue pacman!  We have a lot of new bugs instead of assigned bugs, that might be something we could adjust and assign owners once it is confirmed and briefly discussed- that is still up in the air.

The burning question is what are all the bugs resolved as?


To me this seems healthy, it is a starting point.  Tracking this over time will probably be a useful metric!


In summary, many developers have done great work to make improvements and fix patches over the last 6 months that we have been tracking this information.  There are things we can do better, I want to know-

What information provided today is useful to track regularly?

Is there something you would rather see?



Filed under Uncategorized

The lifecycle of a Talos performance regression

The lifecycle of a Talos performance regression

The cycle of landing a change to Firefox that affects performance

Leave a comment

May 8, 2014 · 9:38 am

Hey, SessionRestore- you have a Talos test

As of last Friday bug 936630 landed so we now have sessionrestore (and sessionrestore_no_auto_restore) as 2 new tests in the Talos suite (posting results under they magic ‘o’ key on tbpl).  In fact we have already seen this test show improvements.

Thanks to Yoric for creating these new tests, give him some karma on irc!  Please refer to the talos docs if you want more information on these tests.


Leave a comment

Filed under Uncategorized

Performance Alerts – by the numbers

If you have ever received an automated mail about a performance regression, and then 10 more, you probably are frustrated by the volume of alerts.  6 months ago, I started looking at the alerts and filing bugs, and 10 weeks ago a little tool was written to help out.

What have I seen in 10 weeks:

1926 alerts on mozilla.dev.tree-management for Talos resulting in 58 bugs filed (or 1 bug/33 alerts):


*keep in mind that many alerts are improvements, as well as duplicated between trees and pgo/nonpgo


Now for some numbers as we uplift.  How are we doing from one release to another?  Are we regressing, Improving?  These are all questions I would like to answer in the coming weeks.

Firefox 30 uplift, m-c -> Aurora:

  • 26 – regressions (4 TART, 4 SVG, 3 TS, Paint, and many more)
    • 2 remaining bugs not resolved as we are now on Beta (bug 990183, bug 990194)


Firefox 31 uplift, m-c -> Aurora (tracking bug 990085):


Is this useful information?

Are there questions you would rather I answer with this data?



Filed under Uncategorized

Performance Bugs – How to stay on top of Talos regressions

Talos is the framework used for desktop Firefox to measure performance for every patch that gets checked in.  Running tests for every checkin on every platform is great, but who looks at the results?

As I mentioned in a previous blog post, I have been looking at the alerts which are posted to dev.tree-management, and taking action on them if necessary.  I will save discussing my alert manager tool for another day.  One great thing about our alert system is that we send an email to the original patch author if we can determine who it is.  What is great is many developers already take note of this and take actions on their own.  I see many patches backed out or discussed with no one but the developer initiating the action.

So why do we need a Talos alert sheriff?  For the main reason that not even half of the regressions are acted upon.  There are valid reasons for this (wrong patch identified, noisy data, doesn’t seem related to the patch) and of course many regressions are ignored due to lack of time.  When I started filing bugs 6 months ago, I incorrectly assumed all of them would be fixed or resolved as wontfix for a valid reason.  This happens for most of the bugs, but many regressions get forgotten about.

When we did the uplift of Firefox 30 from mozilla-central to mozilla-aurora, we saw 26 regression alerts come in and 4 improvement alerts.  This prompted us to revisit the process of what we were doing and what could be done better.  Here are some of the new things we will be doing:

  • For all regressions found, attempt to find the original bug and reopen/comment in the bug
  • For some regressions that it is not easy to find the original bug, we will open a new bug
  • All bugs that have regression information will be marked as blocking a new tracking bug
  • For each release we will create a new tracking bug for all regressions
  • After an uplift from central->aurora, we will ensure we have all alerts mapped to existing regressions

As this process goes through a cycle or two, we will refine it to ensure we have less noise for developers and more accuracy in tracking regressions faster


Leave a comment

Filed under Uncategorized