Category Archives: general

Looking at Firefox performance 57 vs 63

Last November we released Firefox v.57, otherwise known as Firefox Quantum.  Quantum was in many ways a whole new browser with the focus on speed as compared to previous versions of Firefox.

As I write about many topics on my blog which are typically related to my current work at Mozilla, I haven’t written about measuring or monitoring Performance in a while.  Now that we are almost a year out I thought it would be nice to look at a few of the key performance tests that were important for tracking in the Quantum release and what they look like today.

First I will look at the benchmark Speedometer which was used to track browser performance primarily of the JS engine and DOM.  For this test, we measure the final score produced, so the higher the number the better:

speedometer

You can see a large jump in April, that is when we upgraded the hardware we run the tests on, otherwise we have only improved since last year!

Next I want to look at our startup time test (ts_paint) which measure time to launch the browser from a command line in ms, in this case lower is better:

ts_paint

Here again,  you can see the hardware upgrade in April, overall we have made this slightly better over the last year!

What is more interesting is a page load test.  This is always an interesting test and there are many opinions about the right way to do this.  How we do pageload is to record a page and replay it with mitmproxy.  Lucky for us (thanks to neglect) we have not upgraded our pageset so we can really compare the same page load from last year to today.

For our pages we initially setup, we have 4 pages we recorded and have continued to test, all of these are measured in ms so lower is better.

Amazon.com (measuring time to first non blank paint):

amazon

We see our hardware upgrade in April, otherwise small improvements over the last year!

Facebook (logged in with a test user account, measuring time to first non blank paint):

facebook

Again, we have the hardware upgrade in April, and overall we have seen a few other improvements 🙂

Google (custom hero element on search results):

google

Here you can see that what we had a year ago, we were better, but a few ups and downs, overall we are not seeing gains, nor wins (and yes, the hardware upgrade is seen in April).

Youtube (measuring first non blank paint):

youtube

As you can see here, there wasn’t a big change in April with the hardware upgrade, but in the last 2 months we see some noticeable improvements!

In summary, none of our tests have shown regressions.  Does this mean that Firefox v.63 (currently on Beta) is faster than Firefox Quantum release of last year?  I think the graphs here show that is true, but your mileage may vary.  It does help that we are testing the same tests (not changed) over time so we can really compare apples to apples.  There have been changes in the browser and updates to tools to support other features including some browser preferences that change.  We have found that we don’t necessarily measure real world experiences, but we get a good idea if we have made things significantly better or worse.

Some examples of how this might be different for you than what we measure in automation:

  • We test in an isolated environment (custom prefs, fresh profile, no network to use, no other apps)
  • Outdated pages that we load have most likely changed in the last year
  • What we measure as a startup time or a page loaded time might not reflect what a user perceives as accurate

 

3 Comments

Filed under general, testdev

Working towards a productive definition of “intermittent orange”

Intermittent Oranges (tests which fail sometimes and pass other times) are an ever increasing problem with test automation at Mozilla.

While there are many common causes for failures (bad tests, the environment/infrastructure we run on, and bugs in the product)
we still do not have a clear definition of what we view as intermittent.  Some common statements I have heard:

  • It’s obvious, if it failed last year, the test is intermittent
  • If it failed 3 years ago, I don’t care, but if it failed 2 months ago, the test is intermittent
  • I fixed the test to not be intermittent, I verified by retriggering the job 20 times on try server

These are imply much different definitions of what is intermittent, a definition will need to:

  • determine if we should take action on a test (programatically or manually)
  • define policy sheriffs and developers can use to guide work
  • guide developers to know when a new/fixed test is ready for production
  • provide useful data to release and Firefox product management about the quality of a release

Given the fact that I wanted to have a clear definition of what we are working with, I looked over 6 months (2016-04-01 to 2016-10-01) of OrangeFactor data (7330 bugs, 250,000 failures) to find patterns and trends.  I was surprised at how many bugs had <10 instances reported (3310 bugs, 45.1%).  Likewise, I was surprised at how such a small number (1236) of bugs account for >80% of the failures.  It made sense to look at things daily, weekly, monthly, and every 6 weeks (our typical release cycle).  After much slicing and dicing, I have come up with 4 buckets:

  1. Random Orange: this test has failed, even multiple times in history, but in a given 6 week window we see <10 failures (45.2% of bugs)
  2. Low Frequency Orange: this test might fail up to 4 times in a given day, typically <=1 failures for a day. in a 6 week window we see <60 failures (26.4% of bugs)
  3. Intermittent Orange: fails up to 10 times/day or <120 times in 6 weeks.  (11.5% of bugs)
  4. High Frequency Orange: fails >10 times/day many times and are often seen in try pushes.  (16.9% of bugs or 1236 bugs)

Alternatively, we could simplify our definitions and use:

  • low priority or not actionable (buckets 1 + 2)
  • high priority or actionable (buckets 3 + 4)

Does defining these buckets about the number of failures in a given time window help us with what we are trying to solve with the definition?

  • Determine if we should take action on a test (programatically or manually):
    • ideally buckets 1/2 can be detected programatically with autostar and removed from our view.  Possibly rerunning to validate it isn’t a new failure.
    • buckets 3/4 have the best chance of reproducing, we can run in debuggers (like ‘rr’), or triage to the appropriate developer when we have enough information
  • Define policy sheriffs and developers can use to guide work
    • sheriffs can know when to file bugs (either buckets 2 or 3 as a starting point)
    • developers understand the severity based on the bucket.  Ideally we will need a lot of context, but understanding severity is important.
  • Guide developers to know when a new/fixed test is ready for production
    • If we fix a test, we want to ensure it is stable before we make it tier-1.  A developer can use math of 300 commits/day and ensure we pass.
    • NOTE: SETA and coalescing ensures we don’t run every test for every push, so we see more likely 100 test runs/day
  • Provide useful data to release and Firefox product management about the quality of a release
    • Release Management can take the OrangeFactor into account
    • new features might be required to have certain volume of tests <= Random Orange

One other way to look at this is what does gets put in bugs (war on orange bugzilla robot).  There are simple rules:

  • 15+ times/day – post a daily summary (bucket #4)
  • 5+ times/week – post a weekly summary (bucket #3/4 – about 40% of bucket 2 will show up here)

Lastly I would like to cover some exceptions and how some might see this flawed:

  • missing or incorrect data in orange factor (human error)
  • some issues have many bugs, but a single root cause- we could miscategorize a fixable issue

I do not believe adjusting a definition will fix the above issues- possibly different tools or methods to run the tests would reduce the concerns there.

4 Comments

Filed under general, testdev, Uncategorized

5 days in Portland with Mozillians and 10 great things that came from it

I took a lot of notes in Portland last week.  One might not know that based on the fact that I talked so much my voice ran out of steam by the second day.  Either way, in chatting with some co-workers yesterday about what we took away from Portland, I realized that there is a long list of awesomeness.

Let me caveat this by saying that some of these ideas have been talked about in the past, but despite our efforts to work with others and field interesting and useful ideas, there is a big list of great things that came to light while chatting in person:

  • :bgrins mentioned a mozscreenshot tool and the need for getting a screenshot of new features in development on various platforms so UX can review the changes.  Currently it is a method of asking UX to download the build from try or some other location and run it locally to see the changes.
  • :heycam/:jwatt – had a great an interesting talos discussion.  Mostly around how to run it and validate patches/fixes locally and on try server. (check out bug 1109243)
  • :glandium is looking at doing some changes (I recall something with build/pgo) and wanted to know how to compare some Talos numbers to help make the right decision – this can be done with either bug 1109243, or the existing compare.py in the Talos repo (we might need some cleanup on this)
  • :bobowen has been working to get csb tests working- after chatting in line to board a plane, it became clear he needs to solve some finer grain test selection problems- many of which the ateam has on a roadmap in Q2/Q3 – I see some tighter collaboration happening here.
  • Thanks to chatting with :lsblakk, I am motivated to expand the talos sheriff team and look for dedicated Mozillians (or soon to become Mozillians) to work with in keeping a lid on the alerts and overall state of performance (based on what we measure).
  • :lightsofapollo had a great conversation with me about TaskCluster and what barriers stood in the way of running Talos on it – this will result is some initial investigation work!
  • :kats was asking me how to generate alerts for areweslimyet.com.  This is very doable via posting data to graph server
  • After a good session on how to handle intermittents (seems like the same people have this conversation every time a bunch of Mozillians get together), I am motivated to push Titanic further to find the root cause of an intermittent via brute force retriggers (ideally on weekends).  In fact :dbaron has done this a few times in the last month and so have the sheriffs.  This is similar to what we do to verify a talos regression, just with some different parameters.
  • The same conversation about intermittents yielded a stronger desire to look at new tests coming into the system and validating stability.  The simple solution is to run the job 100 times, verify that the new test didn’t have issues and then leave it along.  Of course we could get smart and do this for all test_* files that are edited in the tree.  Thanks to :ehsan for spawning this conversation.
  • Discussing the idea of a Talos Sheriff with a few folks, it seems like there are further conversations needs with the existing Sheriff team as well as to chat with :vladan and :avih about what type of policy we should have for existing performance failures which are detected.  I would expect some changes to be made early next year as we have more tests and need more help.  My initial thoughts are specifically with responding to regressions or getting backed out in XX hours.  Yeah that sounds nasty, but there are probably cut and dry parameters we can set and start enforcing.

Those are 10 specific topics which despite everybody knowing how to contact me or the ateam and share great ideas or frustrations, these came out of being in the same place at the same time.

Thinking through this, when I see these folks in a real office while working from there for a few days or a week, it seems as though the conversations are smaller and not as intense.  Usually just small talk whilst waiting for a build to finish.  I believe the idea where we are not expected to focus on our day to day work and instead make plans for the future is the real innovation behind getting these topics surfaced.

1 Comment

Filed under general

Say hi to Kaustabh Datta Choudhury, a newer Mozillian

A couple months ago I ran into :kaustabh93 online as he had picked up a couple good first bugs.  Since then he has continued to work very hard and submit a lot of great pull requests to Ouija and Alert Manager (here is his github profile).  After working with him for a couple of weeks, I decided it was time to learn more about him, and I would like to share that with Mozilla as a whole:

Tell us about where you live-

I live in a town called Santragachi in West Bengal. The best thing about this place is its ambience. It is not at the heart of the city but the city is easily accessible. That keeps the maddening crowd of the city away and a calm and peaceful environment prevails here.

Tell us about your school-

 I completed my schooling from Don Bosco School, Liluah. After graduating from there, now I am pursuing an undergraduate degree in Computer Science & Engineering from MCKV Institute of Engineering.

Right from when it was introduced to me, I was in love with the subject ‘Computer Science’. And introduction to coding was one of the best things that has happened to me so far.

Tell us about getting involved with Mozilla-

I was looking for some exciting real life projects to work on during my vacation & it was then that the idea of contributing to open source projects had struck me. Now I have been using Firefox for many years now and that gave me an idea of where to start looking. Eventually I found the volunteer tab and thus started my wonderful journey on Mozilla.

Right from when I was starting out, till now, one thing that I liked very much about Mozilla was that help was always at hand when needed. On my first day , I popped a few questions in the IRC channel #introduction & after getting the basic of where to start out, I started working on Ouija under the guidance of ‘dminor’ & ‘jmaher’. After a few bug fixes there, Dan recommended me to have a look at Alert Manager & I have been working on it ever since. And the experience of working for Mozilla has been great.

Tell us what you enjoy doing-

I really love coding. But apart from it I also am an amateur photographer & enjoy playing computer games & reading books.

Where do you see yourself in 5 years?

In 5 years’ time I prefer to see myself as a successful engineer working on innovative projects & solving problems.

If somebody asked you for advice about life, what would you say?

Rather than following the crowd down the well-worn path, it is always better to explore unchartered territories with a few.

:kaustabh93 is back in school as of this week, but look for activity on bugzilla and github from him.  You will find him online once in a while in various channels, I usually find him in #ateam.

4 Comments

Filed under Community, general

Mozilla A-Team – Unraveling the mystery of Talos – part 1 of a googol

Most people at Mozilla have heard of Talos, if you haven’t, Talos is the performance testing framework that runs for every checkin that occurs at Mozilla.

Over the course of the last year I have had the opportunity to extend, modify, retrofit, rewrite many parts of the harness and tests that make up Talos.

It seems that once or twice a month I get a question about Talos.  Wouldn’t it be nice if I documented Talos?  When Alice was the main owner of Talos, she had written up some great documentation and as of today I am announcing that it has gone through an update:

https://wiki.mozilla.org/Buildbot/Talos.

Stay tuned as there will be more updates to come as we make the documentation more useful.

3 Comments

Filed under general, testdev

Two failed attempts with technology today, just one of those days

Today I experienced two WTF moments while trying to use computers:

1) BrowserID ended up being a total failure for me

2) Accessing people.mozilla.org is next to impossible when trying to share files across computers

I have heard great things about BrowserID, and today was my first real chance at it.  I had an account on builder.addons.mozilla.org, and this was with my <me>@mozilla.com email address.  It has been a few months since I had been on there and now it uses BrowserID for all access.  Great!!  I had signed up with BrowserID with my <me>@gmail.com address, but that failed to log me in.  So I clicked the ‘add another email address’, and got a verification email in my inbox.  Trying to verify was impossible with some cryptic error messages.  10 minutes later after trying to log in, I finally found my way to #identity and was told to try it again.  It magically worked.  OK, let me log in to my addons account, no luck.  After 15 more minutes of poking around, I found that my @mozilla.com email address worked with BrowserID just fine by testing it on another site, but it still failed on addons.

Here is my take of the problem:

  • BrowserID is supposed to make logging in easier, 30 minutes of debugging and I still cannot login.
  • There are no useful error and help messages on the BrowserID site, nor AMO.  How could my mom figure this out?
  • Where in the world is my ‘I forgot my username/password’ link?  Honestly I could have signed up on AMO with a totally random email address and could have been wasting a lot of time.
  • I found it easier to signup as a new user with a different BrowserID email, than to figure out how to login with my normal account.

My next problem occurs with accessing people.mozilla.org.  I have been using this for 3.5 years on a regular basis.  I put log files up there for people to read, zip files when I want to share some code or an build, and sometimes I create a webpage to outline data.  I depend on this as a workflow since I know of no other file server at mozilla that I can just scp files up to.  Just this past weekend, some work was done on the server and the permissions got messed up.  This was fixed, then it wasn’t, it was fixed and now it isn’t.  I can detect patterns and that is a pretty easy pattern to detect.  What really gets me is this message when I log in:

Last login: Thu May 17 18:41:20 2012 from zlb1.dmz.scl3.mozilla.com
All files stored on this server are subject to automated scans.
You shouldn’t store sensitive information on this server, and you should
avoid having production services depend on data stored here.
Files in ~/public_html may be seen by anyone on the Internet.
[jmaher@people1.dmz.scl3 ~]$

Who in their right mind would think that putting files in a folder called ‘public_html’ would not be seen by anyone on the Internet?  I expect tomorrow I will have to sign a NDA to access my people.mozilla.org account.

The big problem here is that I wasted 20 minutes doing a task that I normally do in 2 minutes and delayed getting a perma red test fixed because I couldn’t find a place to upload a fixed talos.zip to.

Enough complaining and ranting and back to work on reftests for android native!

5 Comments

Filed under general, personal, reviews

Professional Development, Improv and your audience

I had the opportunity to attend some really exciting professional development sessions at the All Hands.  Personally I found these very interesting, but I heard a lot of grumbling about how these are not adding a lot of value or of interest.

One reason I found these interesting is that in a previous life I had attended a few years of Improv acting classes and did a short stint of real onstage Improv acting.  In looping back to these professional development sessions, they reminded me of the core concepts we learned in Improv 101.  So if you felt that you missed out, sign up for an Improv class.  Maybe if there are professional development sessions at a future event they could just have an Improv acting class.

Related to the professional development courses, I found that most of these were sparsely attended.  Of those that did attend the courses received great reviews/ratings.  To be fair, the technical tracks that I attended had about the same attendance records of the professional development tracks.  Maybe we are not creating sessions that are of interest to our audience?  I know for the technical tracks we just propose something and it magically becomes a session.  I don’t recall getting any input in what sessions would be available to me.  Maybe in the future we can do a better job of getting input from the community (a.k.a audience)!

1 Comment

Filed under general, reviews

mochikit.jar changes are in mozilla central

Last night we landed the final patches to make mochikit.jar a reality.  This started out as a bug where we would package all the mochikit harness + chrome tests into a single .jar file and then on a remote system copy that to the application directory and run the tests locally.  It ended up being much more than that, let me explain some of the changes that have taken place.

why change all of this?

In order to test remotely (on mobile devices such as windows mobile and android) where there are not options to run tools and python scripts, we need to put everything in the browser that it needs and launch the browser remotely.  The solution for tests that are not accessible over the network is to run them from local files.

what is mochikit.jar?

Mochikit.jar is an extension that is installed in the profile and contains all the core files that mochitest (plain, chrome, browser-chrome, a11y) needs to run in a browser.  This doesn’t contain any of the external tools such as ssltunnel and python scripts to set up a webserver.  When you do a build, you will see a new directory in $(objdir)/_tests/testing/mochitest called mochijar.  Feel free to poke around there.  As a standalone application all chrome://mochikit/content calls will use this extension, not a pointer to the local file system.  The original intention of mochkit.jar was to include tests data, but we found that to create an extension using jar.mn we needed a concrete list of files and that was not reasonable to do for our test files.  So we created tests.jar.

what is tests.jar?

tests.jar is the actual test data for browser-chrome, chrome, and a11y tests.  These are all tests that are not straightforward to access remotely over http, so we are running these locally out of a .jar file.  tests.jar is only created when you do a ‘make package-tests’ and ends up in the root of the mochitest directory as tests.jar.  If the harness finds this file, it copies it to the profile and generates a .manifest file for the .jar file, otherwise we generate a plain .manifest file to point to the filesytem.  Finally we dynamically registers tests.manifest from the profile.  Now all your tests will have a chrome://mochitests/content instead of chrome://mochikit/content.

What else changed?

A lot of tests had to change to work with this because we had hardcoded chrome://mochikit/content references in our test code and data.  It is fine to have that in there for references to the harness and core utilities, but to reference a local piece of test data, it was hardcoded and didn’t need to be.  A few tests required some more difficult work where we had to extract files temporarily to a temp folder in the profile and reference them with a file path.

what do I need to do when writing new tests?

please don’t cut and paste code then change it to reference a data, utility, or other uri that has chrome://mochikit/content/ in it.  If you need to access a file with the full URI or as a file path, here are some tips:

* a mochitest-chrome test that needs to reference a file in the same directory or subdir:
let chromeDir = getRootDirectory(window.location.href);

* a browser-chrome test that needs to reference a file in the same directory or subdir:
//NOTE: gTestPath is set because window.location.href is not always set on browser-chrome tests
let chromeDir = getRootDirectory(gTestPath);

* extracting files to temp and accessing them

  let rootDir = getRootDirectory(gTestPath);
  let jar = getJar(rootDir);
  if (jar) {
    let tmpdir = extractJarToTmp(jar);
    rootDir = "file://" + tmpdir.path + '/';
  }
  loader.loadSubScript(rootDir + "privacypane_tests.js", this);

Leave a comment

Filed under general, testdev

filtering mochitests for remote testing

One major problem we encounter in running all our unittests on Fennec is the large volume of failures and how to manage them.  Currently we have turned off mochitests on the Maemo tinderbox because nobody looked at the results (we still run reftest/crashtest/xpcshell!)

In many of my previous posts, I outlined methods for running tests remotely and that has proven to be very useful.  In order to test this code and continue developing it (without windows mobile or a working android build yet), I have developed a simple python test-agent that can run on a linux box (including n900.)  If you are curious, check it out and watch tests run remotely…it is pretty cool.

So the real problem I need to solve is how to not run a list of tests on a mobile device.  Solving this could get us to green faster and reduce the mochitest runtime in half!

In 2008 my solution was Maemkit.  Maemkit is just a small wrapper around the python test runner scripts that does some file (renaming) and directory (splitting into smaller chunks) manipulation to allow for more reliable test runs.  This has worked great and still works.  Enter remote testing and we need to hack up Maemkit a lot to accommodate for everything.  In addition a lot of the work maemkit does is already in the test runners.

Today I have moved the filtering to be a bit more configurable and less obscure.  This is really just a prototype and toolset to solve a problem for me locally, but the idea is something worth sharing.  What I have done is built up a json file (most of this was done automatically with this parsing script) which outlines the test and has some ‘tags’ that I can filter on:

   {
     "fennec-tags" : {"orange": "", "remote": "", "timeout": ""},
     "name" : "/tests/toolkit/content/tests/widgets/test_tooltip.xul"
   },
   {
     "fennec-results" : {"fail": 0, "todo": 0, "pass": 51},
     "name" : "/tests/MochiKit_Unit_Tests/test_MochiKit-Async.html",
     "note" : []
   }

You can see I now can run or skip tests that are tagged ‘orange’ or ‘timeout’. Better yet, I can skip tests with fennec-results that match fail > 0 if I want everything to be green.

So I took this a bit further since I wanted to turn these on or off depending on if I wanted to run tests or to investigate bugs, and I allowed for a filter language to be parsed in mochitest/tests/SimpleTest/setup.js.  I then modified my runtestsremote.py (subclass of runtests.py) so when launching mochitest from the command line I could control this like:

#run all tests that match the 'orange' tag
python runtestsremote.py --filter='run:fennec-tags(orange)'

#skip all tests that have the timeout tag
python runtestsremote.py --filter='skip:fennec-tags(timeout)'

#skip all tests that have failures > 0
python runtestsremote.py --filter='skip:fennec-results(fail>0)'

You should now see the power of this filtering and that with some more detailed thinking we could have a powerful engine to run what we want.  Of course this can run on regular mochitest (if you take the code in this patch from runtestsremote.py and add it to runtests.py.in) and run all the orange tests in a loop or something like that.

As a note, I previously mentioned a parsing script.   With some cleanup you could automatically create this json filter file based on tinderbox runs and fill in the tags to identify scenarios like orange (failing sometimes), timeouts, focus problems, etc…

Happy filtering!

Leave a comment

Filed under general, testdev

Stair climbing as a sport

This post deviates from my slew of Mozilla automation related posts, but feel free to read along. After moving to a high rise condo building a year ago I started taking the stairs (to the 39th floor) instead of the elevator a few times per week. As time went on this became enjoyable and I could make it to the top without falling on the floor with shaky legs, gasping for air and on the verge of needing life support (just ask my wife about my earlier climbs.)

Time to step it up (pardon the pun). I became involved in some of the online groups, and noticed a lot of other people getting into the sport. It seems like the last couple years has seen an explosion of participants, elite competitors (ones who actually run up the stairs), and events in cities all over the world. Earlier this month I took the plunge and signed up for my first race. This is small in comparison to the Sears tower or the CN tower, but you have to start somewhere.

Wish me luck in 4 weeks and consider climbing up the stairs next time you are waiting for the elevator, it really is fun.

11 Comments

Filed under general, personal