Keeping an eye on Performance alerts

Over the last 6 months there has been a deep focus on performance in order to release Firefox 57. Hundreds of developers sought out performance improvements and after thousands of small adjustments we see massive improvements.

Last week I introduced Ionut who has come in as a Performance Sheriff.  What do we do on a regular basis when it comes to monitoring performance.  In the past I focused on Talos and how many bugs per release we found, fixed, and closed.  While that is fun and interesting, we have expanded the scope of sheriffing.

Currently we have many frameworks:

  • Talos (old fashioned perf testing, in-tree, per commit, all desktop platforms- startup, benchmarks, pageload)
  • build_metrics (compile time, installer size, sccache hit rate, num_constructors, etc.)
  • AWSY (are we slim yet, now in-tree, per commit, measuring memory during heavy pageload activity)
  • Autophone (android fennec startup + talos tests, running on 4 different phones, per commit)
  • Platform Microbenchmarks (developer written GTEST (cpp code), mostly graphics and stylo specific)

We continue to refine benchmarks and tests on each of these frameworks to ensure we are running on relevant configurations, measuring the right things, and not duplicating data unnecessarily.

Looking at the list of frameworks, we collect 1127 unique data points and alert on them with included bugs for anything sustained and valid.  While the number of unique metrics can change, here are the current number of metrics we track:

Framework Total Metrics
Talos 624
Autophone 19
Build Metrics 172
AWSY 83
Platform Microbenchmarks 229
1127

While we generate these metrics for every commit (or every few commits for load reasons), what happens is we detect a regression and generate an alert.  In fact we have a sizable number of alerts in the last 6 weeks:

Framework Total Alerts
Talos 429
Autophone 77
Build Metrics 264
AWSY 85
Platform Microbenchmarks 227
1082

Alerts are not really what we file bugs on, instead we have an alert summary when can (and typically) does contain a set of alerts.  Here is the total number of alert summaries (i.e. what a sheriff will look at):

Framework Total Summaries
Talos 172
Autophone 54
Build Metrics 79
AWSY 29
Platform Microbenchmarks 136
470

These alert summaries are then mapped into bugs (or downstream alerts to where the alerts started).  Here is a breakdown of the bugs we have:

Framework Total Bugs
Talos 41
Autophone 3
Build Metrics 17
AWSY 6
Platform Microbenchmarks 6
73

This indicates there are 73 bugs associated with Performance Summaries . What is deceptive here is many of those bugs are ‘improvements’ and not ‘regressions’.  If you figured it out, we do associate improvements with bugs and try to comment in the bugs to let you know of the impact your code has on a [set of] metric[s].

Framework Total Bugs
Talos 23
Autophone 3
Build Metrics 14
AWSY 4
Platform Microbenchmarks 3
47

This is a much smaller number of bugs- now there are a few quirks here-

  • some regressions show up across multiple frameworks (reduces to 43 total)
  • some bugs that are ‘downstream’ are marked against the root cause instead of just being downstream.  Often this happens when we are sheriffing bugs and a downstream alert shows up a couple days later.

Over the last few releases here are the tracking bugs:

Note that Firefox 58 has 28 bugs associated with it, but we have 43 bugs from the above query.  Some of those bugs from the above query are related to Firefox 57, and some are starred against a duplicate bug or a root cause bug instead of the regression bug.

I hope you find this data useful and informative towards understanding what goes on with all the performance data.

Advertisements

Leave a comment

Filed under testdev

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s