Category Archives: Uncategorized

Recent fixes to reduce backlog on Android phones

Last week it seemed that all our limited resource machines were perpetually backlogged. I wrote yesterday to provide insight into what we run and some of our limitations. This post will be discussing the Android phones backlog last week specifically.

The Android phones are hosted at Bitbar and we split them into pools (battery testing, unit testing, perf testing) with perf testing being the majority of the devices.

There were 6 fixes made which resulted in significant wins:

  1. Recovered offline devices at Bitbar
  2. Restarting host machines to fix intermittent connection issues at Bitbar
  3. Update Taskcluster generic-worker startup script to consume superceded jobs
  4. Rewrite the scheduling script as multi-threaded and utilize bitbar APIs more efficiently
  5. Turned off duplicate jobs that were on by accident last month
  6. Removed old taskcluster-worker devices

On top of this there are 3 future wins that could be done to help future proof this:

  1. upgrade android phones from 8.0 -> 9.0 for more stability
  2. Enable power testing on generic usb hubs rather than special hubs which require dedicated devices.
  3. merge all separate pools together to maximize device utilization

With the fixes in place, we are able to keep up with normal load and expect that future spikes in jobs will be shorter lived, instead of lasting an entire week.

Recovered offline devices at Bitbar
Every day a 2-5 devices are offline for some period of time. The Bitbar team finds some on their own and resets the devices, sometimes we notice them and ask for the devices
to be reset. In many cases the devices are hung or have trouble on a reboot (motivation for upgrading to 9.0). I will add to this that the week prior things started getting sideways and it was a holiday week for many, so less people were watching things and more devices ended up in various states.

In total we have 40 pixel2 devices in the perf pool (and 37 Motorola G5 devices as well) and 60 pixel2 devices when including the unittest and battery pools. We found that 19 devices were not accepting jobs and needed attention Monday July 8th. For planning purposes it is assumed that 10% of the devices will be offline, in this case we had 1/3 of our devices offline and we were doing merge day with a lot of big pushes running all the jobs.

Restarting host machines to fix intermittent connection issues at Bitbar
At Bitbar we have a host machine with 4 or more docker containers running and each docker container runs Linux with the Taskcluster generic-worker and the tools to run test jobs. Each docker container is also mapped directly to a phone. The host machines are rarely rebooted and maintained, and we noticed a few instances where the docker containers had trouble connecting to the network.  A fix for this was to update the kernel and schedule periodic reboots.

Update Taskcluter generic-worker startup script
When the job is superseded, we would shut down the Taskcluster generic-worker, the docker image, and clean up. Previously it would terminate the job and docker container and then wait for another job to show up (often a 5-20 minute cycle). With the changes made Taskcluster generic-worker will just restart (not docker container) and quickly pick up the next job.

Rewrite the scheduling script as multi-threaded
This was a big area of improvement that was made. As our jobs increased in volume and had a wider range of runtimes, our tool for scheduling was iterating through the queue and devices and calling the APIs at Bitbar to spin up a worker and hand off a task. This is something that takes a few seconds per job or device and with 100 devices it could take 10+ minutes to come around and schedule a new job on a device. With changes made last week ( Bug 1563377 ) we now have jobs starting quickly <10 seconds, which greatly increases our device utilization.

Turn off duplicate opt jobs and only run PGO jobs
In reviewing what was run by default per push and on try, a big oversight was discovered. When we turned PGO on for Android, all the perf jobs were scheduled both for opt and PGO, when they should have been only scheduled for PGO. This was an easy fix and cut a large portion of the load down (Bug 1565644)

Removed old taskcluster-worker devices
Earlier this year we switched to Taskcluster generic-worker and in the transition had to split devices between the old taskcluster-worker and the new generic-worker (think of downstream branches).  Now everything is run on generic-worker, but we had 4 devices still configured with taskcluster-worker sitting idle.
Given all of these changes, we will still have backlogs that on a bad day could take 12+ hours to schedule try tasks, but we feel confident with the current load we have that most of the time jobs will be started in a reasonable time window and worse case we will catch up every day.

A caveat to the last statement, we are enabling webrender reftests on android and this will increase the load by a couple devices/day. Any additional tests that we schedule or large series of try pushes will cause us to hit the tipping point. I suspect buying more devices will resolve many complaints about lag and backlogs. Waiting for 2 more weeks would be my recommendation to see if these changes made have a measurable change on our backlog. While we wait it would be good to have agreement on what is an acceptable backlog and when we cross that threshold regularly that we can quickly determine the number of devices needed to fix our problem.

Advertisements

2 Comments

Filed under Uncategorized

backlogs, lag, and waiting

Many times each week I see a ping on IRC or Slack asking “why are my jobs not starting on my try push?”  I want to talk about why we have backlogs and some things to consider in regards to fixing the problem.

It a frustrating experience when you have code that you are working on or ready to land and some test jobs have been waiting for hours to run. I personally experienced this the last 2 weeks while trying to uplift some test only changes to esr68 and I would get results the next day. In fact many of us on our team joke that we work weekends and less during the week in order to get try results in a reasonable time.

It would be a good time to cover briefly what we run and where we run it, to understand some of the variables.

In general we run on 4 primary platforms:

  • Linux: Ubuntu 16.04
  • OSX: 10.14.5
  • Windows: 7 (32 bit) + 10 (v1803) + 10 (aarch64)
  • Android: Emulator v7.0, hardware 7.0/8.0

In addition to the platforms, we often run tests in a variety of configs:

  • PGO / Opt / Debug
  • Asan / ccov (code coverage)
  • Runtime prefs: qr (webrender) / spi (socket process) / fission (upcoming)

In some cases a single test can run >90 times for a given change when iterated through all the different platforms and configurations. Every week we are adding many new tests to the system and it seems that every month we are changing configurations somehow.

In total for January 1st to June 30th (first half of this year) Mozilla ran >25M test jobs. In order to do that, we need a lot of machines, here is what we have:

  • linux
    • unittests are in AWS – basically unlimited
    • perf tests in data center with 200 machines – 1M jobs this year
  • Windows
    • unittests are in AWS – some require instances with a dedicated GPU and that is a limited pool)
    • perf tests in data center with 600 machines – 1.5M jobs this year
    • Windows 10 aarch64 – 35 laptops (at Bitbar) that run all unittests and perftests, a new platform in 2019 and 20K jobs this year
    • Windows 10 perf reference (low end) laptop – 16 laptops (at Bitbar) that run select perf tests, 30K jobs this year
  • OSX
    • unittests and perf tests run in data center with 450 mac minis – 380K jobs this year
  • Android
    • Emulators (packet.net fixed pool of 50 hosts w/4 instances/host) 493K jobs this year – run most unittests on here
      • will have much larger pool in the near future
    • real devices – we have 100 real devices (at Bitbar) – 40 Motorola – G5’s, 60 Google Pixel2’s running all perf tests and some unittests- 288K jobs this year

You will notice that OSX, some windows laptops, and android phones are a limited resource and we need to be careful for what we run on them and ensure our machines and devices are running at full capacity.

These limited resource machines are where we see jobs scheduled and not starting for a long time. We call this backlog, it could also be referred to as lag. While it would be great to point to a public graph showing our backlog, we don’t have great resources that are uniform between all machine types. Here is a view of what we have internally for the Android devices:bitbar_queue

What typically happens when a developer pushes their code to a try server to run all the tests, many jobs finish in a reasonable amount of time, but jobs scheduled on resource constrained hardware (such as android phones) typically have a larger lag which then results in frustration.

How do we manage the load:

  1. reduce the number of jobs
  2. ensure tooling and infrastructure is efficient and fully operational

I would like to talk about how to reduce the number of jobs. This is really important when dealing with limited resources, but we shouldn’t ignore this on all platforms. The things to tweak are:

  1. what tests are run and on what branches
  2. what frequency we run the tests at
  3. what gets scheduled on try server pushes

I find that for 1, we want to run everything everywhere if possible, this isn’t possible, so one of our tricks is to run things on mozilla-central (the branch we ship nightlies off of) and not on our integration branches. A side effect here is a regression isn’t seen for a longer period of time and finding a root cause can be more difficult. One recent fix was  when PGO was enabled for android we were running both regular tests and PGO tests at the same time for all revisions- we only ship PGO and only need to test PGO, the jobs were cut in half with a simple fix.

Looking at 2, the frequency is something else. Many tests are for information or comparison only, not for tracking per commit.  Running most tests once/day or even once/week will give a signal while our most diverse and effective tests are running more frequently.

The last option 3 is where all developers have a chance to spoil the fun for everyone else. One thing is different for try pushes, they are scheduled on the same test machines as our release and integration branches, except they are put in a separate queue to run which is priority 2. Basically if any new jobs get scheduled on an integration branch, the next available devices will pick those up and your try push will have to wait until all integration jobs for that device are finished. This keeps our trees open more frequently (if we have 50 commits with no tests run, we could be backing out changes from 12 hours ago which maybe was released or maybe has bitrot while performing the backout). One other aspect of this is we have >10K jobs one could possibly run while scheduling a try push and knowing what to run is hard. Many developers know what to run and some over schedule, either out of difficulty in job selection or being overly cautious.

Keeping all of this in mind, I often see many pushes to our try server scheduling what looks to be way too many jobs on hardware. Once someone does this, everybody else who wants to get their 3 jobs run have to wait in line behind the queue of jobs (many times 1000+) which often only get ran during night for North America.

I would encourage developers pushing to try to really question if they need all jobs, or just a sample of the possible jobs.  With tools like |/.mach try fuzzy| , |./mach try chooser| , or |./mach try empty| it is easier to schedule what you need instead of blanket commands that run everything.  I also encourage everyone to cancel old try pushes if a second try push has been performed to fix errors from the first try push- that alone saves a lot of unnecessary jobs from running.

 

1 Comment

Filed under Uncategorized

Experiment: Adjusting SETA to run individual files instead of individual jobs

3.5 years ago we implemented and integrated SETA.  This has a net effect today of reducing our load between 60-70%.  SETA works on the premise of identifying specific test jobs that find real regressions and marking them as high priority.  While this logic is not perfect, it proves a great savings of test resources while not adding a large burden to our sheriffs.

There are a two things we could improve upon:

  1. a test job that finds a failure runs dozens if not hundreds of tests, even though the job failed for only a single test that found a failure.
  2. in jobs that are split to run in multiple chunks, it is likely that tests failing in chunk 1 could be run in chunk X in the future- therefore making this less reliable

I did an experiment in June (was PTO and busy on migrating a lot of tests in July/August) where I did some queries on the treeherder database to find the actual test cases that caused the failures instead of only the job names.  I came up with a list of 171 tests that we needed to run and these ran in 6 jobs in the tree using 147 minutes of CPU time.

This was a fun project and it gives some insight into what a future could look like.  The future I envision is picking high priority tests via SETA and using code coverage to find additional tests to run.  There are a few caveats which make this tough:

  1. Not all failures we find are related to a single test- we have shutdown leaks, hangs, CI and tooling/harness changes, etc.  This experiment only covers tests that we could specify in a manifest file (about 75% of the failures)
  2. My experiment didn’t load balance on all configs.  SETA does a great job of picking the fewest jobs possibly by knowing if a failure is windows specific we can run on windows and not schedule on linux/osx/android.  My experiment was to see if we could run tests, but right now we have no way to schedule a list of test files and specify which configs to run them on. Of course we can limit this to run “all these tests” on “this list of configs”.  Running 147 minutes of execution on 27 different configs doesn’t save us much, it might take more time than what we currently do.
  3. It was difficult to get the unique test failures.  I had to do a series of queries on the treeherder data, then parse it up, then adjust a lot of the SETA aggregation/reduction code- finally getting a list of tests- this would require a few days of work to sort out if we wanted to go this route and we would need to figure out what to do with the other 25% of failures.
  4. The only way to run is using per-test style used for test-verify (and the in-progress per-test code coverage).  This has a problem of changing the way we report tests in the treeherder UI- it is hard to know what we ran and didn’t run and to summarize between bugs for failures could be interesting- we need a better story for running tests and reporting them without caring about chunks and test harnesses (for example see my running tests by component experiment)
  5. Assuming this was implemented- this model would need to be tightly integrated into the sheriffing and developer workflow.  For developers, if you just want to run xpcshell tests, what does that mean for what you see on your try push?  For sheriffs, if there is a new failure, can we backfill it and find which commit caused the problem?  Can we easily retrigger the failed test?

I realized I did this work and never documented it.  I would be excited to see progress made towards running a more simplified set of tests, ideally reducing our current load by 75% or more while keeping our quality levels high.

Leave a comment

Filed under Uncategorized

Project Stockwell (reduce intermittents) – April 2017

I am 1 week late in posting the update for Project Stockwell.  This wraps up a full quarter of work.  After a lot of concerns raised by developers about a proposed new backout policy, we moved on and didn’t change too much although we did push a little harder and I believe we have disabled more than we fixed as a result.

Lets look at some numbers:

Week Starting 01/02/17 02/27/17 03/24/17
Orange Factor 13.76 9.06 10.08
# P1 bugs 42 32 55
OF(P2) 7.25 4.78 5.13

As you can see we increased in March on all numbers- but overall a great decrease so far in 2017.

There have been a lot of failures which have lingered for a while which are not specific to a test.  For example:

  • windows 8 talos has a lot of crashes (work is being done in bug 1345735)
  • reftest crashes in bug 1352671.
  • general timeouts in jobs in bug 1204281.
  • and a few other leaks/timeouts/crashes/harness issues unrelated to a specific test
  • infrastructure issues and tier-3 jobs

While these are problematic, we see the overall failure rate is going down.  In all the other bugs where the test is clearly a problem we have seen many fixes which and responses to bugs from so many test owners and developers.  It is rare that we would suggest disabling a test and it was not agreed upon, and if there was concern we had a reasonable solution to reduce or fix the failure.

Speaking of which, we have been tracking total bugs, fixed, disabled, etc with whiteboard tags.  While there was a request to not use “stockwell” in the whiteboard tags and to make them more descriptive, after discussing this with many people we couldn’t come to agreement on names or what to track and what we would do with the data- so for now, we have remained the same.  Here is some data:

03/07/17 04/11/17
total 246 379
fixed 106 170
disabled 61 91
infrastructure 11 17
unknown 44 60
needswork 24 38
% disabled 36.53% 34.87%

What is interesting is that prior to march we had disabled 36.53% of the fixes, but in March when we were more “aggressive” about disabling tests, the overall percentage went down.  In fact this is a cumulative number for the year, so for the month of March alone we only disable 31.91% of the fixed tests.  Possibly if we had disabled a few more tests the overall numbers would have continued to go down vs slightly up.

A lot of changes took place on the tree in the last month, some interesting data on newer jobs:

  • taskcluster windows 7 tests are tier-2 for almost all windows VM tests
  • autophone is running all media tests which are not crashing or perma failing
  • disabled external media tests on linux platforms
  • added stylo mochitest and mochitest-chrome
  • fixed stylo reftests to run in e10s mode and on ubuntu 16.04

Upcoming job changes that I am aware of:

  • more stylo tests coming online
  • more linux tests moving to ubuntu 16.04
  • push to green up windows 10 taskcluster vm jobs

Regarding our tests, we are working on tracking new tests added to the tree, what components they belong in, what harness they run in, and overall how many intermittents we have for each component and harness.  Some preliminary work shows that we added 942 mochitest*/xpcshell tests in Q1 (609 were imported webgl tests, so we wrote 333 new tests, 208 of those are browser-chrome).  Given the fact that we disabled 91 tests and added 942, we are not doing so bad!

Looking forward into April and Q2, I do not see immediate changes to a policy needed, maybe in May we can finalize a policy and make it more formal.  With the recent re-org, we are now in the Product Integrity org.  This is a good fit, but dedicating full time resources to sheriffing and tooling for the sake of project stockwell is not in the mission.  Some of the original work will continue as it serves many purposes.  We will be looking to formalize some of our practices and tools to make this a repeatable process to ensure that progress can still be made towards reducing intermittents (we want <7.0) and creating a sustainable ecosystem for managing these failures and getting fixes in place.

 

Leave a comment

Filed under Uncategorized

Project Stockwell (reduce intermittents) – March 2017

Over the last month we had a higher rate of commits, failures, and fixes. One large thing is that we turned on stylo specific tests and that was a slightly rocky road. Last month we suggested disabling tests after 2 weeks of seeing the failures. We ended up disabling many tests, but fixing many more.

In addition to more disabling of tests, we implemented a set bugzilla whiteboard entries to track our progress:
* [stockwell fixed] – a fix went in (even if it partially fixed the problem)
* in the last 2 months, we have 106
* [stockwell disabled] – we disabled the test in at least one config and no fix
* in the last 2 months, we have 61
* [stockwell infra] – Infra issues are usually externally driven
* in the last 2 months, we have 11
* [stockwell unknown] – this became less intermittent with no clear reason
* in the last 2 months, we have 44
* [stockwell needswork] – bugs in progress
* in the last 2 months, we have 24

We have also been tracking the orange factor and number of high frequency intermittents:

Week starting: Jan 02, 2017 Jan 30, 2017 Feb 27, 2017
Orange Factor (OF) 13.76 10.75 9.06
# priority intermittents 42 61 32
OF – priority intermittents 7.25 5.78 4.78

I added a new row here, tracking the Orange Factor assuming all of the high frequency intermittent bugs didn’t exist.  This is what the long tail looks like and I am really excited to see that number going down over time.  For me a healthy spot would be OF <5.0 and the long tail <3.0.

We also looked at the number of unique bugs and repeat bugs/week.  Most bugs have a lifecycle of 2 weeks and 2/3 of the bugs we see in a given week were high frequency (HF) the week prior.  For example this past week we had 32 HF bugs and 21 of them were from the previous week (11 were still HF 2 weeks prior).

While it is nice to assume we should just disable all tests, we find that many developers are actively working on these issues and it shows that we have many more fixed bugs than disabled bugs.  The main motivation for disabling tests is to reduce the confusion for developers on try and to reduce the work the sheriffs need to do.  Taking this data into account we are looking to adjust our policy for disabling slightly:

  1. all high frequency bugs (>=30 times/week) will be triaged and expected to be resolved in 2 weeks, otherwise we will start the process of disabling the test that is causing the bug
  2. if a bug occurs >75 times/week, it will be triaged but expectations are that it will be resolved in 1 week, otherwise we will start the process of disabling the test that is causing the bug
  3. if a bug is reduced below a high frequency (< 30 times/week), we will be happy to make a note of that and keep an eye on it- but will not look at disabling the test.

The big change here is we will be more serious on disabling tests specifically when a test is >= 75 times/week.  We have had many tests failing at least 50% of the time for weeks, these show up on almost all try pushes that run these tests.  Developers should not be seeing failures like these.  Since we are tracking fixed vs disabled, if we determine that we are disabling too much, we can revisit this policy next month.

Outside of numbers and policy, our goal is to have a solid policy, process, and toolchain available for self triaging as the year goes on.  We are refining the policy and process via manual triage.  The toolchain is the other work we are doing, here are some updates:

  • adding BUG_COMPONENTS to all files in m-c (bug 1328351) – slow and steady progress, thanks for the reviews to date!  We got behind to get SETA completed, but much of the heavy lifting is already done
  • retrigger an existing job with additional debugging arguments (bug 1322433) – main discussion is done, figuring out small details, we have a prototype working with little work remaining.  Next steps would be to implement the top 3 or 4 use cases.
  • add a test-lint job to linux64/mochitest (bug 1323044) – no progress yet- this got put on the backburner as we worked on SETA and focused on triage, whiteboard tags, and BUG_COMPONENTS.  We have landed code for using the ‘when’ clause for test jobs (bug 1342963) which is a small piece of this.  Getting this initially working will move up in priority soon, and making this work on all harnesses/platforms will most likely be a Google Summer of Code project.

Are there items we should be working on or looking into?  Please join our meetings.

Leave a comment

Filed under Uncategorized

Project Stockwell – February 2017

I realized my post for last month was titled “Project Stockwell – January 2016” – that is a fun typo to make 🙂

Last month we focused on triaging all bugs that met our criteria of >=30 failures/week.  Every day there are many new bugs to triage and we started with a large list.  In the end we have commented on all the bugs and have a small list every day to revisit or investigate.

One thing we focus on is only requesting assistance at most once per week- to that note we have a “Neglected Oranges” dashboard that we use daily.

What is changing this month- We will be recommending resolution on priority bugs (>=30 failures/week) in 2 weeks time.  Resolution is active debugging, landing changes to the test to reduce,debug, or fix the intermittent, or in the case where there is a lack of time or ease of finding a fix disabling the test.  If this goes well, we will reduce that down to 7 days in March.

So how are we doing?

Week starting: Jan 02, 2017 Jan 30, 2017
Orange Factor 13.76 10.75
# priority intermittent 42 61

We have less overall failures, but more bugs spread out.  Some interesting bugs:

In terms of projects underway, here is some status:

  • adding BUG_COMPONENTS to all files in m-c (bug 1328351) – slow and steady progress, thanks for the reviews to date!  We expect large majority of this to be completed this month.
  • retrigger an existing job with additional debugging arguments (bug 1322433) – main discussion is done, figuring out small details, should see a prototype this month
  • add |mach test-info| support (bug 1324470) – landed today!
  • add a test-lint job to linux64/mochitest (bug 1323044) – no progress yet, I expect some this month.

Are there items we should be working on or looking into?  Please join our meetings.

Leave a comment

Filed under Uncategorized

Working towards a productive definition of “intermittent orange”

Intermittent Oranges (tests which fail sometimes and pass other times) are an ever increasing problem with test automation at Mozilla.

While there are many common causes for failures (bad tests, the environment/infrastructure we run on, and bugs in the product)
we still do not have a clear definition of what we view as intermittent.  Some common statements I have heard:

  • It’s obvious, if it failed last year, the test is intermittent
  • If it failed 3 years ago, I don’t care, but if it failed 2 months ago, the test is intermittent
  • I fixed the test to not be intermittent, I verified by retriggering the job 20 times on try server

These are imply much different definitions of what is intermittent, a definition will need to:

  • determine if we should take action on a test (programatically or manually)
  • define policy sheriffs and developers can use to guide work
  • guide developers to know when a new/fixed test is ready for production
  • provide useful data to release and Firefox product management about the quality of a release

Given the fact that I wanted to have a clear definition of what we are working with, I looked over 6 months (2016-04-01 to 2016-10-01) of OrangeFactor data (7330 bugs, 250,000 failures) to find patterns and trends.  I was surprised at how many bugs had <10 instances reported (3310 bugs, 45.1%).  Likewise, I was surprised at how such a small number (1236) of bugs account for >80% of the failures.  It made sense to look at things daily, weekly, monthly, and every 6 weeks (our typical release cycle).  After much slicing and dicing, I have come up with 4 buckets:

  1. Random Orange: this test has failed, even multiple times in history, but in a given 6 week window we see <10 failures (45.2% of bugs)
  2. Low Frequency Orange: this test might fail up to 4 times in a given day, typically <=1 failures for a day. in a 6 week window we see <60 failures (26.4% of bugs)
  3. Intermittent Orange: fails up to 10 times/day or <120 times in 6 weeks.  (11.5% of bugs)
  4. High Frequency Orange: fails >10 times/day many times and are often seen in try pushes.  (16.9% of bugs or 1236 bugs)

Alternatively, we could simplify our definitions and use:

  • low priority or not actionable (buckets 1 + 2)
  • high priority or actionable (buckets 3 + 4)

Does defining these buckets about the number of failures in a given time window help us with what we are trying to solve with the definition?

  • Determine if we should take action on a test (programatically or manually):
    • ideally buckets 1/2 can be detected programatically with autostar and removed from our view.  Possibly rerunning to validate it isn’t a new failure.
    • buckets 3/4 have the best chance of reproducing, we can run in debuggers (like ‘rr’), or triage to the appropriate developer when we have enough information
  • Define policy sheriffs and developers can use to guide work
    • sheriffs can know when to file bugs (either buckets 2 or 3 as a starting point)
    • developers understand the severity based on the bucket.  Ideally we will need a lot of context, but understanding severity is important.
  • Guide developers to know when a new/fixed test is ready for production
    • If we fix a test, we want to ensure it is stable before we make it tier-1.  A developer can use math of 300 commits/day and ensure we pass.
    • NOTE: SETA and coalescing ensures we don’t run every test for every push, so we see more likely 100 test runs/day
  • Provide useful data to release and Firefox product management about the quality of a release
    • Release Management can take the OrangeFactor into account
    • new features might be required to have certain volume of tests <= Random Orange

One other way to look at this is what does gets put in bugs (war on orange bugzilla robot).  There are simple rules:

  • 15+ times/day – post a daily summary (bucket #4)
  • 5+ times/week – post a weekly summary (bucket #3/4 – about 40% of bucket 2 will show up here)

Lastly I would like to cover some exceptions and how some might see this flawed:

  • missing or incorrect data in orange factor (human error)
  • some issues have many bugs, but a single root cause- we could miscategorize a fixable issue

I do not believe adjusting a definition will fix the above issues- possibly different tools or methods to run the tests would reduce the concerns there.

4 Comments

Filed under general, testdev, Uncategorized