Tag Archives: performance

Watching the watcher – Some data on the Talos alerts we generate

What are the performance regressions at Mozilla- who monitors them and what kind of regressions do we see?  I want to answer this question with a few peeks at the data.  There are plenty of previous blog posts I have done outlining stats, trends, and the process.  Lets recap what we do briefly, then look at the breakdown of alerts (not necessarily bugs).

When Talos uploads numbers to graph server they get stored and eventually run through a calculation loop to find regressions and improvements.  As of Jan 1, 2015, we upload these to mozilla.dev.tree-alerts as well as email to the offending patch author (if they can easily be identified).  There are a couple folks (performance sheriffs) who look at the alerts and triage them.  If necessary a bug is filed for further investigation.  Reading this brief recap of what happens to our performance numbers probably doesn’t inspire folks, what is interesting is looking at the actual data we have.

Lets start with some basic facts about alerts in the last 12 months:

  • We have collected 8232 alerts!
  • 4213 of those alerts are regressions (the rest are improvements)
  • 3780 of those above alerts have a manually marked status
    • the rest have been programatically marked as merged and associated with the original
  • 278 bugs have been filed (or 17 alerts/bug)
    • 89 fixed!
    • 61 open!
    • 128 (5 invalid, 8 duplicate, 115 wontfix/worksforme)

As you can see this is not a casual hobby, it is a real system helping out in fixing and understanding hundreds of performance issues.

We generate alerts on a variety of branches, here is the breakdown of branches and alerts/branch;

number of regression alerts we have received per branch

number of regression alerts we have received per branch

There are a few things to keep in mind here, mobile/mozilla-central/Firefox are the same branch, and for non-pgo branches that is only linux/windows/android, not osx. 

Looking at that graph is sort of non inspiring, most of the alerts will land on fx-team and mozilla-inbound, then show up on the other branches as we merge code.  We run more tests/platforms and land/backout stuff more frequently on mozilla-inbound and fx-team, this is why we have a larger number of alerts.

Given the fact we have so many alerts and have manually triaged them, what state the the alerts end up in?

Current state of alerts

Current state of alerts

The interesting data point here is that 43% of our alerts are duplicates.  A few reasons for this:

  • we see an alert on non-pgo, then on pgo (we usually mark the pgo ones as duplicates)
  • we see an alert on mozilla-inbound, then the same alert shows up on fx-team,b2g-inbound,firefox (due to merging)
    • and then later we see the pgo versions on the merged branches
  • sometimes we retrigger or backfill to find the root cause, this generates a new alert many times
  • in a few cases we have landed/backed out/landed a patch and we end up with duplicate sets of alerts

The last piece of information that I would like to share is the break down of alerts per test:

Alerts per test

number of alerts per test (some excluded)

There are a few outliers, but we need to keep in mind that active work was being done in certain areas which would explain a lot of alerts for a given test.  There are 35 different test types which wouldn’t look good in an image, so I have excluded retired tests, counters, startup tests, and android tests.

Personally, I am looking forward to the next year as we transition some tools and do some hacking on the reporting, alert generation and overall process.  Thanks for reading!

1 Comment

Filed under testdev

Tracking Firefox performance as we uplift – the volume of alerts we get

For the last year, I have been focused on ensuring we look at the alerts generated by Talos.  For the last 6 months I have also looked a bit more carefully at the uplifts we do every 6 weeks.  In fact we wouldn’t generate alerts when we uplifted to beta because we didn’t run enough tests to verify a sustained regression in a given time window.

Lets look at data, specifically the volume of alerts:

Trend of improvements/regressions from Firefox 31 to 36 as we uplift to Aurora

Trend of improvements/regressions from Firefox 31 to 36 as we uplift to Aurora

this is a stacked graph, you can interpret it as Firefox 32 had a lot of improvements and Firefox 33 had a lot of regressions.  I think what is more interesting is how many performance regressions are fixed or added when we go from Aurora to Beta.  There is minimal data available for Beta.  This next image will compare alert volume for the same release on Aurora then on Beta:

Side by side stacked bars for the regressions going into Aurora and then going onto Beta.

Side by side stacked bars for the regressions going into Aurora and then going onto Beta.

One way to interpret this above graph is to see that we fixed a lot of regressions on Aurora while Firefox 33 was on there, but for Firefox 34, we introduced a lot of regressions.

The above data is just my interpretation of this, Here are links to a more fine grained view on the data:

As always, if you have questions, concerns, praise, or other great ideas- feel free to chat via this blog or via irc (:jmaher).

Leave a comment

Filed under testdev

A case of the weekends?

Case of the Mondays

What was famous 15 years ago as a case of the Mondays has manifested itself in Talos.  In fact, I wonder why I get so many regression alerts on Monday as compared to other days.  It is more to a point of we have less noise in our Talos data on weekends.

Take for example the test case tresize:

linux32,

* in fact we see this on other platforms as well linux32/linux64/osx10.8/windowsXP

30 days of linux tresize

Many other tests exhibit this.  What is different about weekends?  Is there just less data points?

I do know our volume of tests go down on weekends mostly as a side effect of less patches being landed on our trees.

Here are some ideas I have to debug this more:

  • Run massive retrigger scripts for talos on weekends to validate # of samples is/isnot the problem
  • Reduce the volume of talos on weekdays to validate the overall system load in the datacenter is/isnot the problem
  • compare the load of the machines with all branches and wait times to that of the noise we have in certain tests/platforms
  • Look at platforms like windows 7, windows 8, and osx 10.6 as to why they have more noise on weekends or are more stable.  Finding the delta in platforms would help provide answers

If you have ideas on how to uncover this mystery, please speak up.  I would be happy to have this gone and make any automated alerts more useful!

2 Comments

Filed under testdev

The lifecycle of a Talos performance regression

The lifecycle of a Talos performance regression

The cycle of landing a change to Firefox that affects performance

Leave a comment

May 8, 2014 · 9:38 am

Performance Alerts – by the numbers

If you have ever received an automated mail about a performance regression, and then 10 more, you probably are frustrated by the volume of alerts.  6 months ago, I started looking at the alerts and filing bugs, and 10 weeks ago a little tool was written to help out.

What have I seen in 10 weeks:

1926 alerts on mozilla.dev.tree-management for Talos resulting in 58 bugs filed (or 1 bug/33 alerts):

Image

*keep in mind that many alerts are improvements, as well as duplicated between trees and pgo/nonpgo

 

Now for some numbers as we uplift.  How are we doing from one release to another?  Are we regressing, Improving?  These are all questions I would like to answer in the coming weeks.

Firefox 30 uplift, m-c -> Aurora:

  • 26 – regressions (4 TART, 4 SVG, 3 TS, Paint, and many more)
    • 2 remaining bugs not resolved as we are now on Beta (bug 990183, bug 990194)

 

Firefox 31 uplift, m-c -> Aurora (tracking bug 990085):

 

Is this useful information?

Are there questions you would rather I answer with this data?

 

3 Comments

Filed under Uncategorized

Performance Bugs – How to stay on top of Talos regressions

Talos is the framework used for desktop Firefox to measure performance for every patch that gets checked in.  Running tests for every checkin on every platform is great, but who looks at the results?

As I mentioned in a previous blog post, I have been looking at the alerts which are posted to dev.tree-management, and taking action on them if necessary.  I will save discussing my alert manager tool for another day.  One great thing about our alert system is that we send an email to the original patch author if we can determine who it is.  What is great is many developers already take note of this and take actions on their own.  I see many patches backed out or discussed with no one but the developer initiating the action.

So why do we need a Talos alert sheriff?  For the main reason that not even half of the regressions are acted upon.  There are valid reasons for this (wrong patch identified, noisy data, doesn’t seem related to the patch) and of course many regressions are ignored due to lack of time.  When I started filing bugs 6 months ago, I incorrectly assumed all of them would be fixed or resolved as wontfix for a valid reason.  This happens for most of the bugs, but many regressions get forgotten about.

When we did the uplift of Firefox 30 from mozilla-central to mozilla-aurora, we saw 26 regression alerts come in and 4 improvement alerts.  This prompted us to revisit the process of what we were doing and what could be done better.  Here are some of the new things we will be doing:

  • For all regressions found, attempt to find the original bug and reopen/comment in the bug
  • For some regressions that it is not easy to find the original bug, we will open a new bug
  • All bugs that have regression information will be marked as blocking a new tracking bug
  • For each release we will create a new tracking bug for all regressions
  • After an uplift from central->aurora, we will ensure we have all alerts mapped to existing regressions

As this process goes through a cycle or two, we will refine it to ensure we have less noise for developers and more accuracy in tracking regressions faster

 

Leave a comment

Filed under Uncategorized

status of the winmo automated tests project

I have been posting about this project for a while, so I figured I should give an update. Currently patches are landing and we are starting to get the final set of patches ready for review.

  • Talos: This was the first part of this project and we have checked in 3 of the 4 patches to get Talos TS running. There is 1 patch remaining which I need to upload for review
  • Mochitest: There are 4 patches required for this to work:
    1. Fix tests to not use hardcoded localhost – early review stages
    2. Add CLI options to mochitest for remote webserver – I need to cleanup my patch for review, at the end game
    3. Add devicemanager.py to the source tree – review started, waiting on sutagent.exe to resolve a few minor bugs
    4. Add runtestsremote.py to the source tree – review process started, waiting on other patches

    Good news is all 4 patches are at the review stage

  • Reftest: This requires 4 patches (1 is devicemanager.py from mochitest)
    1. Modify reftest.jar to support http url for manifest and test files – up for review
    2. Refactor runreftests.py – up for review
    3. Add remotereftests.py to source tree – needs work before review, but WIP posted

    Keep in mind here we are still blocked on registering the reftest extension. I also have instructions for how to setup and run this.

  • Xpcshell: this requires 3 patches (1 is device manager) and is still in WIP stages. There are two pieces to this that we still need to resolve: copying over the xpcshell data to the device and setting up a webserver to serve pages. Here are the two patches to date:
    1. Refactor runxpcshelltests.py to support subclass for winmo – WIP patch posted, close to review stage
    2. Add remotexpcshelltests.py to source tree – WIP patch posted

    I have written some instructions on how to run xpcshell tests on winmo if you are interested.

Stay tuned for updates when we start getting these patches landed and resolving some of our device selection/setup process.

4 Comments

Filed under testdev