Over the last 6 months there has been a deep focus on performance in order to release Firefox 57. Hundreds of developers sought out performance improvements and after thousands of small adjustments we see massive improvements.
Last week I introduced Ionut who has come in as a Performance Sheriff. What do we do on a regular basis when it comes to monitoring performance. In the past I focused on Talos and how many bugs per release we found, fixed, and closed. While that is fun and interesting, we have expanded the scope of sheriffing.
Currently we have many frameworks:
- Talos (old fashioned perf testing, in-tree, per commit, all desktop platforms- startup, benchmarks, pageload)
- build_metrics (compile time, installer size, sccache hit rate, num_constructors, etc.)
- AWSY (are we slim yet, now in-tree, per commit, measuring memory during heavy pageload activity)
- Autophone (android fennec startup + talos tests, running on 4 different phones, per commit)
- Platform Microbenchmarks (developer written GTEST (cpp code), mostly graphics and stylo specific)
We continue to refine benchmarks and tests on each of these frameworks to ensure we are running on relevant configurations, measuring the right things, and not duplicating data unnecessarily.
Looking at the list of frameworks, we collect 1127 unique data points and alert on them with included bugs for anything sustained and valid. While the number of unique metrics can change, here are the current number of metrics we track:
While we generate these metrics for every commit (or every few commits for load reasons), what happens is we detect a regression and generate an alert. In fact we have a sizable number of alerts in the last 6 weeks:
Alerts are not really what we file bugs on, instead we have an alert summary when can (and typically) does contain a set of alerts. Here is the total number of alert summaries (i.e. what a sheriff will look at):
These alert summaries are then mapped into bugs (or downstream alerts to where the alerts started). Here is a breakdown of the bugs we have:
This indicates there are 73 bugs associated with Performance Summaries . What is deceptive here is many of those bugs are ‘improvements’ and not ‘regressions’. If you figured it out, we do associate improvements with bugs and try to comment in the bugs to let you know of the impact your code has on a [set of] metric[s].
This is a much smaller number of bugs- now there are a few quirks here-
- some regressions show up across multiple frameworks (reduces to 43 total)
- some bugs that are ‘downstream’ are marked against the root cause instead of just being downstream. Often this happens when we are sheriffing bugs and a downstream alert shows up a couple days later.
Over the last few releases here are the tracking bugs:
- Firefox 55 – 56 regressions
- Firefox 56 – 65 regressions
- Firefox 57 – 50 regressions
- Firefox 58 – 28 regressions (to date – 1.5 weeks left)
Note that Firefox 58 has 28 bugs associated with it, but we have 43 bugs from the above query. Some of those bugs from the above query are related to Firefox 57, and some are starred against a duplicate bug or a root cause bug instead of the regression bug.
I hope you find this data useful and informative towards understanding what goes on with all the performance data.