Category Archives: testdev

Please welcome the Dashboard Hacker team

A few weeks ago we announced that we would be looking for committed contributors for 8+ weeks on Perfherder.  We have found a few great individuals, all of whom show a lot of potential and make a great team.

They are becoming familiar with the code base and are already making a dent in the initial list of work set aside.  Let me introduce them (alphabetical order by nicks):

akhileshpillai – Akhilesh has jumped right in and started fixing bugs in Perfherder.  He is new to Mozilla and will fit right in.  With how fast he has come up to speed, we are looking forward to what he will be delivering in the coming weeks.  We have a lot of UI workflow as well as backend data refactoring work on our list, all of which he will be a great contributor towards.

mikeling – mikeling has been around Mozilla for roughly two years, recently he started out helping with a few projects on the A*Team.  He is very detailed oriented, easy to work with and is willing to tackle big things.

theterabyte – Tushar is excited about this program as an opportunity to grow his skills as a python developer and experiencing how software is built outside of a classroom.  Tushar will get a chance to grow his UI skills on Perfherder by making the graphs and compare view more polished and complete, while helping out with an interface for the alerts.

Perfherder will soon become the primary dashboard for all our performance needs.

I am looking forward to the ideas and solutions these new team members bring to the table.  Please join me in welcoming them!

Leave a comment

Filed under Community, testdev

Please join me in welcoming the DX Team

A few weeks ago, I posted a call out for people to reach out and commit to participate for 8+ weeks.  There were two projects and one of them was Developer Experience.  Since then we have had some great interest, there are 5 awesome contributors participating (sorted by irc nicks).

BYK – I met BYK 3+ years ago on IRC- he is a great person and very ambitious.  As a more senior developer he will be focused primarily on improving interactions with mach.  While there are a lot of little things to make mach better, BYK proposed a system to collect information about how mach is used.

gma_fav – I met gma_fav on IRC when she heard about the program.  She has a lot of energy, seems very detail oriented, asks good questions, and brings fresh ideas to the team!  She is a graduate of the Ascend project and is looking to continue her growth in development and open source.  Her primary focus will be on the interface to try server (think the try chooser page, extension, and taking other experiments further).

kaustabh93 – I met Kaustabh on IRC about a year ago and since then he has been a consistent friend and hacker.  He attends university.  In fact I do owe him credit for large portions of alert manager.  While working on this team, he will be focused on making run-by-dir a reality.  There are two parts: getting the tests to run green, and reducing the overhead of the harness.

sehgalvibhor – I met Vibhor on IRC about 2 weeks ago.  He was excited about the possibility of working on this project and jumped right in.  Like Kaustabh, he is a student who is just finishing up exams this week.  His primary focus this summer will be working in a similar role to Stanley in making our test harnesses act the same and more useful.

stanley – When this program was announced Stanley was the first person to ping me on IRC.  I have found him to be very organized, a pleasure to chat with and he understands code quite well.  Coding and open source are both new things to Stanley and we have the opportunity to give him a great view of it.  Stanley will be focusing on making the commands we have for running tests via mach easier to use and more unified between harnesses.

Personally I am looking forward to seeing the ambition folks have translate into great solutions, learning more about each person, and sharing with Mozilla as a whole the great work they are doing.

Take a few moments to say hi to them online.

Leave a comment

Filed under Community, testdev

the orange factor – no need to retrigger this week

last week I did another round of re-triggering for a root cause and found some root causes!  This week I got an email from orange factor outlining the top 10 failures on the trees (as we do every week).

Unfortunately as of this morning there is no work for me to do- maybe next week I can hunt.

Here is the breakdown of bugs:

  • Bug 1081925 Intermittent browser_popup_blocker.js
    • investigated last week, test is disabled by a sheriff
  • Bug 1118277 Intermittent browser_popup_blocker.js
    • investigated last week, test is disabled by a sheriff
  • Bug 1096302 Intermittent test_collapse.html
    • test is fixed!  already landed
  • Bug 1121145 Intermittent browser_panel_toggle.js
    • too old!  problem got worse on April 24th
  • Bug 1157948 DMError: Non-zero return code for command
    • too old!  most likely a harness/infra issue
  • Bug 1166041 Intermittent LeakSanitizer
    • patch is already on this bug
  • Bug 1165938 Intermittent media-source
    • disabled the test already!
  • Bug 1149955 Intermittent Win8-PGO test_shared_all.py
    • too old!
  • Bug 1160008 Intermittent testVideoDiscovery
    • too old!
  • Bug 1137757 Intermittent Linux debug mochitest-dt1 command timed out
    • harness infra, test chunk is taking too long- problem is being addressed with more chunks.

As you can see there isn’t much to do here.  Maybe next week we will have some actions we can take.  Once I have about 10 bugs investigated I will summarize the bugs, related dates, and status, etc.

Leave a comment

Filed under testdev

A-Team contribution opportunity – Dashboard Hacker

I am excited to announce a new focused project for contribution – Dashboard Hacker.  Last week we gave a preview that today we would be announcing 2 contribution projects.  This is an unpaid program where we are looking for 1-2 contributors who will dedicate between 5-10 hours/week for at least 8 weeks.  More time is welcome, but not required.

What is a dashboard hacker?

When a developer is ready to land code, they want to test it. Getting the results and understanding the results is made a lot easier by good dashboards and tools. For this project, we have a starting point with our performance data view to fix up a series of nice to have polish features and then ensure that it is easy to use with a normal developer workflow. Part of the developer work flow is the regular job view, If time permits there are some fun experiments we would like to implement in the job view.  These bugs, features, projects are all smaller and self contained which make great projects for someone looking to contribute.

What is required of you to participate?

  • A willingness to learn and ask questions
  • A general knowledge of programming (most of this will be in javascript, django, angularJS, and some work will be in python.
  • A promise to show up regularly and take ownership of the issues you are working on
  • Good at solving problems and thinking out of the box
  • Comfortable with (or willing to try) working with a variety of people

What we will guarantee from our end:

  • A dedicated mentor for the project whom you will work with regularly throughout the project
  • A single area of work to reduce the need to get up to speed over and over again.
    • This project will cover many tools, but the general problem space will be the same
  • The opportunity to work with many people (different bugs could have a specific mentor) while retaining a single mentor to guide you through the process
  • The ability to be part of the team- you will be welcome in meetings, we will value your input on solving problems, brainstorming, and figuring out new problems to tackle.

How do you apply?

Get in touch with us either by replying to the post, commenting in the bug or just contacting us on IRC (I am :jmaher in #ateam on irc.mozilla.org, wlach on IRC will be the primary mentor).  We will point you at a starter bug and introduce you to the bugs and problems to solve.  If you have prior work (links to bugzilla, github, blogs, etc.) that would be useful to learn more about you that would be a plus.

How will you select the candidates?

There is no real criteria here.  One factor will be if you can meet the criteria outlined above and how well you do at picking up the problem space.  Ultimately it will be up to the mentor (for this project, it will be :wlach).  If you do apply and we already have a candidate picked or don’t choose you for other reasons, we do plan to repeat this every few months.

Looking forward to building great things!

2 Comments

Filed under Community, testdev

A-Team contribution opportunity – DX (Developer Ergonomics)

I am excited to announce a new focused project for contribution – Developer Ergonomics/Experience, otherwise known as DX.  Last week we gave a preview that today we would be announcing 2 contribution projects.  This is an unpaid program where we are looking for 1-2 contributors who will dedicate between 5-10 hours/week for at least 8 weeks.  More time is welcome, but not required.

What does DX mean?

We chose this project as we continue to experience frustration while fixing bugs and debugging test failures.  Many people suggest great ideas, in this case we have set aside a few ideas (look at the dependent bugs to clean up argument parsers, help our tests run in smarter chunks, make it easier to run tests locally or on server, etc.) which would clean up stuff and be harder than a good first bug, yet each issue by itself would be too easy for an internship.  Our goal is to clean up our test harnesses and tools and if time permits, add stuff to the workflow which makes it easier for developers to do their job!

What is required of you to participate?

  • A willingness to learn and ask questions
  • A general knowledge of programming (this will be mostly in python with some javascript as well)
  • A promise to show up regularly and take ownership of the issues you are working on
  • Good at solving problems and thinking out of the box
  • Comfortable with (or willing to try) working with a variety of people

What we will guarantee from our end:

  • A dedicated mentor for the project whom you will work with regularly throughout the project
  • A single area of work to reduce the need to get up to speed over and over again.
    • This project will cover many tools, but the general problem space will be the same
  • The opportunity to work with many people (different bugs could have a specific mentor) while retaining a single mentor to guide you through the process
  • The ability to be part of the team- you will be welcome in meetings, we will value your input on solving problems, brainstorming, and figuring out new problems to tackle.

How do you apply?

Get in touch with us either by replying to the post, commenting in the bug or just contacting us on IRC (I am :jmaher in #ateam on irc.mozilla.org).  We will point you at a starter bug and introduce you to the bugs and problems to solve.  If you have prior work (links to bugzilla, github, blogs, etc.) that would be useful to learn more about you that would be a plus.

How will you select the candidates?

There is no real criteria here.  One factor will be if you can meet the criteria outlined above and how well you do at picking up the problem space.  Ultimately it will be up to the mentor (for this project, it will be me).  If you do apply and we already have a candidate picked or don’t choose you for other reasons, we do plan to repeat this every few months.

Looking forward to building great things!

3 Comments

Filed under Community, testdev

Watching the watcher – Some data on the Talos alerts we generate

What are the performance regressions at Mozilla- who monitors them and what kind of regressions do we see?  I want to answer this question with a few peeks at the data.  There are plenty of previous blog posts I have done outlining stats, trends, and the process.  Lets recap what we do briefly, then look at the breakdown of alerts (not necessarily bugs).

When Talos uploads numbers to graph server they get stored and eventually run through a calculation loop to find regressions and improvements.  As of Jan 1, 2015, we upload these to mozilla.dev.tree-alerts as well as email to the offending patch author (if they can easily be identified).  There are a couple folks (performance sheriffs) who look at the alerts and triage them.  If necessary a bug is filed for further investigation.  Reading this brief recap of what happens to our performance numbers probably doesn’t inspire folks, what is interesting is looking at the actual data we have.

Lets start with some basic facts about alerts in the last 12 months:

  • We have collected 8232 alerts!
  • 4213 of those alerts are regressions (the rest are improvements)
  • 3780 of those above alerts have a manually marked status
    • the rest have been programatically marked as merged and associated with the original
  • 278 bugs have been filed (or 17 alerts/bug)
    • 89 fixed!
    • 61 open!
    • 128 (5 invalid, 8 duplicate, 115 wontfix/worksforme)

As you can see this is not a casual hobby, it is a real system helping out in fixing and understanding hundreds of performance issues.

We generate alerts on a variety of branches, here is the breakdown of branches and alerts/branch;

number of regression alerts we have received per branch

number of regression alerts we have received per branch

There are a few things to keep in mind here, mobile/mozilla-central/Firefox are the same branch, and for non-pgo branches that is only linux/windows/android, not osx. 

Looking at that graph is sort of non inspiring, most of the alerts will land on fx-team and mozilla-inbound, then show up on the other branches as we merge code.  We run more tests/platforms and land/backout stuff more frequently on mozilla-inbound and fx-team, this is why we have a larger number of alerts.

Given the fact we have so many alerts and have manually triaged them, what state the the alerts end up in?

Current state of alerts

Current state of alerts

The interesting data point here is that 43% of our alerts are duplicates.  A few reasons for this:

  • we see an alert on non-pgo, then on pgo (we usually mark the pgo ones as duplicates)
  • we see an alert on mozilla-inbound, then the same alert shows up on fx-team,b2g-inbound,firefox (due to merging)
    • and then later we see the pgo versions on the merged branches
  • sometimes we retrigger or backfill to find the root cause, this generates a new alert many times
  • in a few cases we have landed/backed out/landed a patch and we end up with duplicate sets of alerts

The last piece of information that I would like to share is the break down of alerts per test:

Alerts per test

number of alerts per test (some excluded)

There are a few outliers, but we need to keep in mind that active work was being done in certain areas which would explain a lot of alerts for a given test.  There are 35 different test types which wouldn’t look good in an image, so I have excluded retired tests, counters, startup tests, and android tests.

Personally, I am looking forward to the next year as we transition some tools and do some hacking on the reporting, alert generation and overall process.  Thanks for reading!

1 Comment

Filed under testdev

Re-Triggering for a [root] cause – some notes from doing it

With all this talk of intermittent failures and folks coming up with ideas on how to solve them, I figured I should dive head first into looking at failures.  I have been working with a few folks on this, specifically :parkouss and :vaibhav1994.  This experiment (actually the second time doing so) is where I take a given intermittent failure bug and retrigger it.  If it reproduces, then I go back in history looking for where it becomes intermittent.  This weekend I wrote up some notes as I was trying to define what an intermittent is.

Lets outline the parameters first for this experiment:

  • All bugs marked with keyword ‘intermittent-failure’ qualify
  • Bugs must not be resolved or assigned to anybody
  • Bugs must have been filed in the last 28 days (we only keep 30 days of builds)
  • Bugs must have >=20 tbplrobot comments (arbitrarily picked, ideally we want something that can easily reproduce)

Here are what comes out of this:

  • 356 bugs are open, not assigned and have the intermittent-failure keyword
  • 25 bugs have >=20 comments meeting our criteria

The next step was to look at each of the 25 bugs and see if it makes sense to do this.  In fact 13 of the bugs I decided not to take action on (remember this is an experiment, so my reasoning for ignoring these 13 could be biased):

  • 5 bugs were thunderbird/mozmill only
  • 3 bugs looked to be related to android harness issues
  • bug 1157090 hadn’t reproduced in 2 weeks- was APZ feature which we turned off.
  • one bug was only on mozilla-beta only
  • 2 bugs had patches and 1 had a couple of comments indicating a developer was already looking at it

This leaves us with 12 bugs to investigate.  The premise here is easy, find the first occurrence of the intermittent (branch, platform, testjob, revision) and re-trigger it 20 times (picked for simplicity).  When the results are in, see if we have reproduced it.  In fact, only 5 bugs reproduced the exact error in the bug when re-triggered 20 times on a specific job that showed the error.

Moving on, I started re-triggering jobs back in the pushlog to see where it was introduced.  I started off with going back 2/4/6 revisions, but got more aggressive as I didn’t see patterns.  Here is a summary of what the 5 bugs turned out like:

  • Bug 1161915 – Windows XP PGO Reftest.  Found the root cause (pgo only, lots of pgo builds were required for this) 23 revisions back.
  • Bug 1160780 – OSX 10.6 mochitest-e10s-bc1.  Found the root cause 33 revisions back.
  • Bug 1161052 – Jetpack test failures.  So many failures in the same test file, it isn’t very clear if I am reproducing the failure or finding other ones.  :erikvold is working on fixing the test in bug 1163796 and ok’d disabling it if we want to.
  • Bug 1161537 – OSX 10.6 Mochitest-other.  Bisection didn’t find the root cause, but this is a new test case which was added.  This is a case where when the new test case was added it could have been run 100+ times successfully, then when it merged with other branches a couple hours later it failed!
  • bug 1155423 – Linux debug reftest-e10s-1.  This reproduced 75 revisions in the past, and due to that I looked at what changed in our buildbot-configs and mozharness scripts.  We actually turned on this test job (hadn’t been running e10s reftests on debug prior to this) and that caused the problem.  This can’t be tracked down by re-triggering jobs into the past.

In summary, out of 356 bugs 2 root causes were found by re-triggering.  In terms of time invested into this, I have put about 6 hours of time to fine the root cause of the 5 bugs.

2 Comments

Filed under testdev