I took a lot of notes in Portland last week. One might not know that based on the fact that I talked so much my voice ran out of steam by the second day. Either way, in chatting with some co-workers yesterday about what we took away from Portland, I realized that there is a long list of awesomeness.
Let me caveat this by saying that some of these ideas have been talked about in the past, but despite our efforts to work with others and field interesting and useful ideas, there is a big list of great things that came to light while chatting in person:
- :bgrins mentioned a mozscreenshot tool and the need for getting a screenshot of new features in development on various platforms so UX can review the changes. Currently it is a method of asking UX to download the build from try or some other location and run it locally to see the changes.
- :heycam/:jwatt – had a great an interesting talos discussion. Mostly around how to run it and validate patches/fixes locally and on try server. (check out bug 1109243)
- :glandium is looking at doing some changes (I recall something with build/pgo) and wanted to know how to compare some Talos numbers to help make the right decision – this can be done with either bug 1109243, or the existing compare.py in the Talos repo (we might need some cleanup on this)
- :bobowen has been working to get csb tests working- after chatting in line to board a plane, it became clear he needs to solve some finer grain test selection problems- many of which the ateam has on a roadmap in Q2/Q3 – I see some tighter collaboration happening here.
- Thanks to chatting with :lsblakk, I am motivated to expand the talos sheriff team and look for dedicated Mozillians (or soon to become Mozillians) to work with in keeping a lid on the alerts and overall state of performance (based on what we measure).
- :lightsofapollo had a great conversation with me about TaskCluster and what barriers stood in the way of running Talos on it – this will result is some initial investigation work!
- :kats was asking me how to generate alerts for areweslimyet.com. This is very doable via posting data to graph server
- After a good session on how to handle intermittents (seems like the same people have this conversation every time a bunch of Mozillians get together), I am motivated to push Titanic further to find the root cause of an intermittent via brute force retriggers (ideally on weekends). In fact :dbaron has done this a few times in the last month and so have the sheriffs. This is similar to what we do to verify a talos regression, just with some different parameters.
- The same conversation about intermittents yielded a stronger desire to look at new tests coming into the system and validating stability. The simple solution is to run the job 100 times, verify that the new test didn’t have issues and then leave it along. Of course we could get smart and do this for all test_* files that are edited in the tree. Thanks to :ehsan for spawning this conversation.
- Discussing the idea of a Talos Sheriff with a few folks, it seems like there are further conversations needs with the existing Sheriff team as well as to chat with :vladan and :avih about what type of policy we should have for existing performance failures which are detected. I would expect some changes to be made early next year as we have more tests and need more help. My initial thoughts are specifically with responding to regressions or getting backed out in XX hours. Yeah that sounds nasty, but there are probably cut and dry parameters we can set and start enforcing.
Those are 10 specific topics which despite everybody knowing how to contact me or the ateam and share great ideas or frustrations, these came out of being in the same place at the same time.
Thinking through this, when I see these folks in a real office while working from there for a few days or a week, it seems as though the conversations are smaller and not as intense. Usually just small talk whilst waiting for a build to finish. I believe the idea where we are not expected to focus on our day to day work and instead make plans for the future is the real innovation behind getting these topics surfaced.