Skip to content

Retry flakey tests #934

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Awk34 opened this issue Jun 23, 2016 · 46 comments
Closed

Retry flakey tests #934

Awk34 opened this issue Jun 23, 2016 · 46 comments

Comments

@Awk34
Copy link

Awk34 commented Jun 23, 2016

I've searched around for a bit and couldn't find anything about retrying flakey tests with AVA.

I've been working on trying AVA for the generator-angular-fullstack Yeoman generator to see if it can speed up our tests. Right now in CircleCI our tests take about 15 minutes. This mainly includes running a few different configurations of the generator, and then some test command after that on the scaffolded code, namely linting, client tests, server tests, and E2E tests. The E2E tests can be quite flakey. Right now, we use Mocha, which has the ablility to add this.retry(2) to a test to retry it x times if it fails. Is there any plan to add something like this to AVA? If not, it would definitely be helpful to note some way of achieving this in the readme.

Environment

  • ava --version: 0.15.2
  • node -v: v5.11.1
  • npm -v: 3.8.6
@jamestalmage
Copy link
Contributor

I think retry is a pretty interesting idea. Lots of projects have a flaky test or two (AVA included).

Is there ever really a need for retry(3)? Could we just always use 2 and not offer a config option?

test.retry(t => {})

@Awk34
Copy link
Author

Awk34 commented Jun 23, 2016

I think the number attempts needs to be configurable. For the generator, I have all the E2E tests retry twice, but the tests for the Sequelize setup is especially flakey, so I've set this to retry thrice.

edit: Could test.retry(2)(t => {}) be done? (with test.retry(t => {}) defaulting to 2)

@jamestalmage
Copy link
Contributor

Could test.retry(2)(t => {}) be done? (with test.retry(t => {}) defaulting to 2)

Yes. I'm not necessarily opposed to it. Just trying to suggest as many alternatives as possible for discussion.

I think the number attempts needs to be configurable.

OK. What if we just allowed 2 or 3:

test.flakey(t => {}) // retry twice
test.veryFlakey(t => {}) // retry thrice

We could extend the flakey keyword to assertions:

test(t => {
  t.flakey.is(foo, bar); // test will be retried up to one more time if a `flakey` assertion fails
  t.veryFlakey.is(foo, bar); // test will be retried up to one more time if a `veryFlakey` assertion fails
  t.is(foo, bar); // if a non-flakey assertion fails - no retries
});

test.flakey(t => {
  // any assertion failure causes a retry
});

test.veryFlakey(t => {
  // any assertion failure causes up to two retries
});

For the flakey assertions, I'm thinking of some of AVA's CLI tests on Windows where we sometimes don't capture the full stdout. Those are slow tests. If the exit code is wrong we should just fail fast - because it's some other problem we don't want masked by retries. If the entire message didn't get printed (because Windows) - well then there is nothing we can do about that - retry and see what happens.


We could provide some more detailed summaries about flakey assertions as well:

5 passed
  2 flakey assertions had to be retried
  3 flakey assertions had to be retried twice
5 passed
2 failed
1 flakey assertion failed even after being retried
1 flakey assertion failed even after being retried twice

@Awk34
Copy link
Author

Awk34 commented Jun 23, 2016

test.flakey(t => {}) // retry twice
test.veryFlakey(t => {}) // retry thrice

Seems reasonable, but I'm sure you'll get someone who asks for the ability to retry more sometime down the road.

We could provide some more detailed summaries about flakey assertions as well:
[...]

I like that a lot. I don't think mocha has the option of showing if a flakey test was retried.

@jamestalmage
Copy link
Contributor

but I'm sure you'll get someone who asks for the ability to retry more sometime down the road.

Isn't a test that regularly fails three times in a row basically useless? If the repeated failure is because you are running on an overcrowded CI VM, or because of network issues, aren't you just as likely to fail four times in a row at that point?

Perhaps we could make the number of retries configurable based on context:

"ava": {
  "retries": {
    "local": [0, 0],       // no retries on devs local machine
    "ci": [1, 2]           // flakey = 1 retries, veryFlakey = 2 retries on CI
  }
}

Overly complicated version:

"ava": {
  "retries": {
    "local": [0, 0],       // no retries on devs local machine
    "win32": [0, 1],       // veryFlakey retries on Windows machines - even on dev machines
    "ci": [1, 2],          // flakey = 1 retries, veryFlakey = 2 retries on CI
    "appveyor": [2, 3]     // extra retries for AppVeyor
  }
}

@Awk34
Copy link
Author

Awk34 commented Jun 23, 2016

Isn't a test that regularly fails three times in a row basically useless? If the repeated failure is because you are running on an overcrowded CI VM, or because of network issues, aren't you just as likely to fail four times in a row at that point?

I completely agree; just saying, I imaging people would want it anyway.

Perhaps we could make the number of retries configurable based on context:
[...]

Seems reasonable. Maybe even just something like this:

"ava": {
  "flakey": 2,
  "veryFlakey": 3
}

@jamestalmage
Copy link
Contributor

For AVA, our "flakey" tests are basically only flakey on Appveyor. They run reliably both locally and on Travis. My concern would be allowing flakiness to creep into our Travis tests unknown to us.

@Awk34
Copy link
Author

Awk34 commented Jun 23, 2016

Would there be an easy way to set some flakey config variable (in ava config or a CLI flag) from Appveyor instead of introducing some platform-specific code into your AVA config?

@jamestalmage
Copy link
Contributor

We could do it:

  1. Via CLI:

    ava --retries=1,2
    
  2. via environment variables:

    set AVA_RETRIES="1,2"
    
  3. via config

    "ava": {
      "retries": [1, 2]
    }

@Awk34
Copy link
Author

Awk34 commented Jun 23, 2016

Yeah, those seem a lot better.

@jamestalmage
Copy link
Contributor

I think that would be the order of precedence (cli flag, environment, config).

  • Setup config for your defaults on developer machines.
  • Override those using environment variables for CI machines.
  • Use the CLI flags to override for one-off runs

Are environment variables even worth it? Or should we just do config and cli?

My thought was that environment variables would allow you to reuse your npm test script without modification, instead of creating a special npm run test-appveyor that was the same but for a few config settings. Especially useful if you have a build step in your pretest script.

@Awk34
Copy link
Author

Awk34 commented Jun 23, 2016

If I tell my CI to run npm test and I have my script as "test": "ava somefile.js", I could easily run npm test --retries=1,2 as well, right?

I think if anyone ever comes up with the need for env var configuration then it can be discussed, but until then, nah. There's no other use of env vars like this in this project (from what I can tell).

@rozzzly
Copy link

rozzzly commented Jun 24, 2016

@jamestalmage

Isn't a test that regularly fails three times in a row basically useless?

A fantastic point. But, IMHO there really isn't a good reason not to make it configurable... As others have pointed out, there are certainly people out there who have setups which would truly merit the option to configure this.

CLI flags / package.json fields would be a simple fix to the issue, but what about someone who'd like to have the number of retries differ from test to test? Might I suggest:

To get the default no. of retries, flakey can be accessed as a property on the chain like so

t.flakey.is(foo, bar);  // retry assertion twice
test.flakey(t => {
    // ...
}) // retry test twice

But then it could also a function that accepts an argument like retries?

t.flakey(7).is(foo, bar); // retry assertion 7 times

as far as setting a custom retry count for a test, the equivalent syntax might be:

test.flakey(7)(t -> {
    // ...
}); // retry test 7 times

But I think the (nTries: number)(test: ((t: ContextualTestContext) => void) is kind of ugly when directly in front of test(). Another option might be

test.flakey(7, 'testTitle', t => {
   // ...
});

That however is a departure from the test() syntax; it might be a little confusing. But logically, would work:

  1. if arguments.length === 3 then arguments[0] is (a number for) the retry count, arguments[1] is (a string for) the title and arguments[2] is the test method (each would be type checked of course)
  2. if arguments.length === 2 then arguments[0] is (a number for) the retry count, arguments[1] is the test method (all typechecked of course)
    • the test title would then, if possible, be inferred from the functions name as per usual.

@jamestalmage
Copy link
Contributor

jamestalmage commented Jun 24, 2016

I don't think you would ever want to retry assertions, just tests.

In my mind the flakey assertion is there to differentiate which assertions are allowed to fail. As a convenience, if you use a flakey assertion, we mark the test flakey. I don't see the point in allowing t.flakey(7).is(...).

As for configuring a specific number of retries with per test granularity, I just don't see it. If we give you a global configuration for what flakey and veryFlakey mean. You really think there is some scenario where you are going to need more control than that?

@rozzzly
Copy link

rozzzly commented Jun 24, 2016

A few more thoughts:

  • Do retries happen immediately? or can the delay be configured?
    • How does that affect other tests in the file? do they run right as it fails and then retest the failed ones? or does it block and wait for the test to pass / limit be exhausted?
    • what might that syntax look like?
  • I assume retried tests would run in the same process in which they failed the first time around.. We should ask: how does this affect state pollution? I'd imagine beforeEach()/afterEach() would rerun when a flakey test fails?

Now that's definitely outside of the scope of AVA; something you'd want to an external tool to manage. Such a thing could rerun failed tests when some user defined conditions are met. That could actually be really useful for testing sessions on servers where overtime, the state gets more and more muddled and scenarios , or something where the connection/resource access might be intermittent.

These were some questions I asked myself when reading this Issue. I'm not sure what use they might be to the AVA team, but I've always found its better to consider a broad range of possibilities before committing to anything. TBH, I can't think of many cases where I, personally, would be using flakey all that much, but perhaps something I said might be desired by another, or atleast give you all a different perspective.

@jamestalmage
Copy link
Contributor

I assume retried tests would run in the same process in which they failed the first time around..

Yes. If you are writing tests that pollute state and are dependent on each-other, then we would need to rerun the whole test file again anyway. We aim to encourage atomic tests.

I'd imagine beforeEach()/afterEach() would rerun when a flakey test fails

Agreed.

@rozzzly
Copy link

rozzzly commented Jun 24, 2016

@jamestalmage
As far as retrying assertions, You were the first person to mention such a thing 😒 I'm just following you line of thought my friend:

t.flakey.is(foo, bar); #934 (comment)

And you're right, it is pretty ridiculous; I would suspect that I'm probably not going to use flakey more than once or twice in my current projects. I merely suggested a syntax/design pattern/what-have-you which would enable users to tweak it as much as one might want in a way that is somewhat contiguous with the rest of the api.

AVA is pretty damn opinionated, and that's what has made it so successful! I certainly don't begrudge the team this; in fact, I greatly admire such adamancy!! So if you'd rather keep the api minimalistic, I totally respect that. Just throwing out ideas here man. 💭 As I described in my second comment, such a thing would probably be better suited as its own tool. Now I'm imagining ava-daemon where you could track/trigger/retry ava tests programmatically. I can think of quite a few cases in which it might be desirable to trigger a test when some condition was met. Might it not be much easier than mocking some complex state? ... (detecting a condition v.s. recreating one) _Not that I'm suggesting doing away with the core philosophy of AVA test, Just a way to retest something under conditions one to cover more edge cases.


Ah good; as it should be! Atomic tests all the way man, better to have the test to blow up 💥 then write ill-conceived tests.

@jamestalmage
Copy link
Contributor

jamestalmage commented Jun 24, 2016

As far as retrying assertions, You were the first person to mention such a thing 😒

I am still proposing t.flakey.is(foo, bar);

test(t => {
  // If this assertion fails, test fails, no retries. It's basically a fail-fast assertion
  t.is(foo, bar);      

  // If this fails, entire **test** will be retried (not just the assertion).      
  t.flakey.is(bar, baz); 
});

If you do test.flakey(t => {}), then there is no fail-fast assertion. Test will be retried on any assertion failure.

@novemberborn
Copy link
Member

This sounds good. @jamestalmage would you mind summarizing the design decisions? Which questions still need to be answered?

Also, 🚲 🏠 wise, is it flaky or flakey? 😉

@jamestalmage
Copy link
Contributor

I gravitated towards flaky, but saw others were spelling it different, so I just assumed I was wrong. Turns out both spellings are correct. Not sure what to do about that. Do we alias them to mean the same thing? I'm normally against aliases, but it seems harsh to penalize either correct spelling. If we do alias, we should add a linter rule that allows people to prefer one or the other (you likely wouldn't want mixed use in the same project). Of the two, I would probably prefer flaky - it's how I naturally wanted to spell it, and it's shorter.

@Awk34
Copy link
Author

Awk34 commented Jun 26, 2016

Aliasing two correct spellings seems reasonable to me, but if there's a more used spelling, that's the one that should be presented in docs.

@jamestalmage
Copy link
Contributor

would you mind summarizing the design decisions?

  • Add two methods to the test chain:

    • test.flaky(t => {}) - If the test fails it will retry one more time.
    • test.veryFlaky(t => {}) - If the test fails, it will retry two more times.
  • Add two configuration options, that allow you to adjust how many times it gets retried.

    {
      "ava": {
        "flaky":  1,  // retry once
        "veryFlaky": 2  // retry twice
      }
    }

    IMO, if they only configure flaky. veryFlaky should default to flaky + 1.

  • Add two CLI flags --flaky=1, --very-flaky=2. These take precedence over the config, allowing you to adjust the numbers higher on AppVeyor (etc).

  • Extend t.flaky and t.veryFlaky with all the same assertions you have on the normal assertion object:

    t.flaky.is(foo, bar);
    t.flaky.notThrows(promise);

    You would use flaky assertions with a normal non-flaky test, allowing you to pick out only one or two assertions that were allowed to be flaky. Those flaky assertions would not be rerun immediately. Instead their failure would cause the entire test to be rerun. Essentially, in this use, your non-flaky assertions are fail-fast for that test.

  • We would rerun the test entirely, including any test.beforeEach hooks.

  • We would run t.afterEach.always hooks on a failed hook, before retrying the test. But not simple t.afterEach hooks.

  • The reporters would report flaky failures. Reminding you that flaky tests still existed in your codebase (even if they aren't failing the build):

    Passing build with flaky failures that resolved on retry:

    5 passed
      2 flakey assertions had to be retried
      3 flakey assertions had to be retried twice
    

    Failing build with flaky failures that did not resolve:

    5 passed
    2 failed
      1 flakey assertion failed even after being retried
      1 flakey assertion failed even after being retried twice
    

Which questions still need to be answered?

  • Is it flakey or flaky? (Both are correct, flaky seems to be more prevalent).
  • Do we alias the less prevalent spelling? Either way, only the more prevalent should appear in the docs. If we do alias, we should create a linter rule to allow users to enforce a particular spelling.
  • Do we allow for veryFlaky assertions? If so, how do we handle a scenario where the first test run has a flaky failure, the second has only a veryFlaky failure? I think you just tap out at veryFlaky max retries.

@novemberborn
Copy link
Member

Is it flakey or flaky? (Both are correct, flaky seems to be more prevalent).

I too prefer flaky because it's shorter. Also: Scotch whisky 😉

Do we alias the less prevalent spelling? Either way, only the more prevalent should appear in the docs. If we do alias, we should create a linter rule to allow users to enforce a particular spelling.

We could fail with a helpful error message if flakey is used.

Do we allow for veryFlaky assertions? If so, how do we handle a scenario where the first test run has a flaky failure, the second has only a veryFlaky failure? I think you just tap out at veryFlaky max retries.

I don't see the point. If it's going to be flaky it's going to be flaky. Configuring the number of retries is fine, but distinguishing levels of flakiness seems overkill. Let's start with just flaky.

@jamestalmage
Copy link
Contributor

I don't see the point. If it's going to be flaky it's going to be flaky. Configuring the number of retries is fine, but distinguishing levels of flakiness seems overkill. Let's start with just flaky.

Are you suggestiong removing veryFlaky on assertions (t.flaky.is(...), etc), or eliminating it from the test API as well: test.flaky(t => {...})?

See #934 (comment).

@sindresorhus
Copy link
Member

sindresorhus commented Jun 26, 2016

I think we should go with flaky, but no aliasing. Instead just throw an helpful error message: "Did you mean flaky?" if they write flakey.

I also don't see the point of veryFlaky. Let's keep it simple.

I don't buy the argument for flaky assertion. IMHO if you have flaky assertions mixed with other working assertions, you should just move them into a separate test. I would also consider having flaky tests an anti-pattern. It should only be temporary until you have time to make it non-flaky.

Regarding a global --flaky config.

Seems reasonable, but I'm sure you'll get someone who asks for the ability to retry more sometime down the road.

We tend to not add things for imaginary use-cases. Can't we just wait until some actual use-cases come in? I think 2 retries is a very good default, and I can't see why anyone would actually need to change that.

I much prefer the "keep it simple until someone complains" methodology.

@novemberborn
Copy link
Member

Are you suggestiong removing veryFlaky on assertions (t.flaky.is(...), etc), or eliminating it from the test API as well: test.flaky(t => {...})?

Both.

I don't buy the argument for flaky assertion. IMHO if you have flaky assertions mixed with other working assertions, you should just move them into a separate test.

You could write a flaky assertion as a way to sanity check whether your test should be able to pass. E.g. a method that sometimes returns null, using t.flaky.true(result !== null) would be the sanity check, after which other assertion failures are actual test failures that don't require the test to be rerun.

@jamestalmage
Copy link
Contributor

You could write a flaky assertion as a way to sanity check whether your test should be able to pass

Yes, that is what I was thinking. It makes it easier to be explicit about what is allowed to be flaky. In that case though, you probably should write two or more tests. A normal test that tests only the non-flaky part, and a flaky test for the assertions that are.

@jamestalmage
Copy link
Contributor

I think 2 retries is a very good default, and I can't see why anyone would actually need to change that.

@Awk34 mentioned he has a mocha test that needs to be rerun three times in CI to prevent failing the build. veryFlaky was my attempt at a compromise that didn't end up with an API like:

test.retry(3)(t => {
  // ...
});

We could eliminate veryFlaky, but then I think allowing --flaky config becomes more important.

I think allowing configuration is valuable if you've got specific platforms where a test is flaky (looking at you AppVeyor!!). I would probably set up AVA's own tests something like this:

{
  "ava": {
    "flaky": 0
  }
  "scripts": {
    "test": "ava",
    "test-appveyor": "ava --flaky=1"
  }

This would enforce that we only allow tests to be flaky on AppVeyor.

Still, there are easy workarounds for this as well:

import test from 'ava';

// only allow flaky on AppVeyor
const flaky = process.env.APPVEYOR ? test.flaky : test;

flaky(t => {
  // ...
});

@novemberborn
Copy link
Member

You could write a flaky assertion as a way to sanity check whether your test should be able to pass

Yes, that is what I was thinking. It makes it easier to be explicit about what is allowed to be flaky. In that case though, you probably should write two or more tests. A normal test that tests only the non-flaky part, and a flaky test for the assertions that are.

My point was that you needed to test the result of a flaky function. There wouldn't be any non-flaky part to test without first verifying that the function wasn't flaky to begin with.

Taken to an extreme you could argue you don't need test.flaky() if you can write flaky assertions.

@sindresorhus
Copy link
Member

mentioned he has a mocha test that needs to be rerun three times in CI to prevent failing the build.

More than 2 retries feels just too flaky. If you really need that you should rethink your tests. I see flaky as giving the computer a couple more chances to do the right thing. If it can't manage that after two additional tries, I don't think it can ever reliably. We shouldn't base our decisions on one person's needs. Let's wait on more use-cases.

I think allowing configuration is valuable if you've got specific platforms where a test is flaky (looking at you AppVeyor!!)

IMHO a test is either non-flaky on all systems, or flaky. If a test is flaky on AppVeyor, it's flaky in nature and should be rethought, or be marked as flaky locally too.

@Awk34
Copy link
Author

Awk34 commented Jul 5, 2016

@sindresorhus I think it's actually 2 tries for my E2E, and 3 tries with Sequelize tests (not retries). Anyhow, the Sequelize tests for my project are too flakey. I don't really use SQL too often, so I've been being lazy about looking into if I can fix those XD.

Also as a side note, the reason this story came up is because I'm looking into using AVA for my Yeoman generator, another project you have a hand in. If you're interested: https://github.com/angular-fullstack/generator-angular-fullstack/tree/perf/gen-tests :D

@vadimdemedes vadimdemedes added enhancement new functionality question labels Aug 9, 2016
@vadimdemedes vadimdemedes changed the title [Feature] Retry flakey tests Retry flakey tests Aug 9, 2016
@novemberborn
Copy link
Member

I think we should:

  • Add t.flaky assertions
  • If a flaky assertion fails, retry the test
    • Run afterEach.always
    • Re-run any beforeEach hook
    • Re-run test
    • Max two times
  • Report flaky failures (Retry flakey tests #934 (comment))

I'm not in favor of test.flaky. Users should be encouraged to determine exactly what is making their test flaky.

@vadimdemedes
Copy link
Contributor

As for me, I'm opposed to t.flaky assertions in favor of test.flaky for the exact same reasons by @novemberborn and @sindresorhus :

Users should be encouraged to determine exactly what is making their test flaky.

I don't buy the argument for flaky assertion. IMHO if you have flaky assertions mixed with other working assertions, you should just move them into a separate test. I would also consider having flaky tests an anti-pattern. It should only be temporary until you have time to make it non-flaky.

@novemberborn
Copy link
Member

@vdemedes I'm in favor of t.flaky actually 😉

@vadimdemedes
Copy link
Contributor

@novemberborn Yeah, but your reason fits my favorness of test.flaky as well 😆

@novemberborn
Copy link
Member

I don't buy the argument for flaky assertion. IMHO if you have flaky assertions mixed with other working assertions, you should just move them into a separate test.

That might not always be possible.

I would also consider having flaky tests an anti-pattern. It should only be temporary until you have time to make it non-flaky.

Of course.

@taylor1791
Copy link

It is unclear to me if there is a consensus on how to implement this feature. If there is a consensus, I would like to take a stab at implementing this next week.

@vadimdemedes
Copy link
Contributor

I just realized something. We could skip adding test.flaky in favor of using "retry" modules, like https://github.com/zeit/async-retry:

test('retry me', retry(async () => {
  // retry the test, if it throws
});

I like this solution, because that way we avoid adding more stuff to the API and we let user handle their unique use cases.

What do you think?

@jfmengels
Copy link
Contributor

I like the idea @vadimdemedes. But I'm not sure that would always work out nicely. I'm thinking of t.fail() which will fail the test anyway.

let failed = false;
test('retry me', retry(async t => {
  if (!failed) {
    failed = true;
    t.fail();
  }
});

This should retry the test and it should pass the second time, but IIRC calling t.fail() will forcefully fail the test whatever happens next.

That said, maybe the retry (maybe some specific ava-retry module then?) module could pass a modified t object whose t.fail() method is modified and won't forcefully fail the test?

@novemberborn
Copy link
Member

Decorating t seems brittle. I'd rather we do this properly.

test.flaky is probably easier to implement, so we could start there. I still like t.flaky as a way to force you to really think about what is flaky about your tests, but perhaps I'm overthinking it.

@bbiagas
Copy link

bbiagas commented Mar 9, 2017

I am just a random person on the internet, but if I can throw in my two cents I think a configurable number of retries would be great. The opinions against it have merit, but I also think there's not a clear definition of what "flaky" means. A flaky test is (in theory) one that passes almost all of the time but occasionally doesn't, for reasons which are not obvious. If a failure turns out to be consistent or obvious for whatever reason then the test isn't flaky; it's just a failing (or poorly written) test.

Actual flaky tests are often symptoms of deep or intermittent issues. The "flake" is telling you something useful you should be inspecting, And, sometimes, looking into these tests means you may want, or need, to rerun them many, many times to try to recreate the circumstances which caused it to fail.

I have a real world example that happened at work recently; I am using Ava to test an Electron app with Spectron (webdriverio). I found that a particular test would fail every now and then for reasons we couldn't explain. It turned out the test was catching a serious (for us) but rare issue our users had reported to us only once or twice before, and a bug in the application code was the culprit. But we needed to run that test hundreds of times and build a better histogram before we could make that connection, determine the magnitude of the problem, and isolate the cause. A convenient way to keep retrying would have been great.

I might go one step further and say it would actually be handy to retry ANY test a configurable number of times. What better way to ensure an actual flaky test has been fixed than to be able to easily run it lots of times instead of hoping you catch it in CI again. Or test your tests a little more and maybe catch it sooner. This is probably not practical, but I would use it sometimes.

I would also suggest that the term used is "retry" or something similar, and not "flaky". There is some baggage among test professionals that comes with the word "flaky", the least of which is that it's not immediately clear what it means.

Just a thought. We're really liking Ava at work so thank you all for your continued hard work and dedication.

@novemberborn
Copy link
Member

Hi @bbiagas thanks for your comment.

I think the scenario you ran into is probably best solved by reproducing the flaky test in a small script that you can run in a loop.

What I'm advocating for here is a way to deal with tests that are flaky due to circumstances out of their control, and ideally not bugs. It's helpful if they're retried once or twice, because you can't always fix those circumstances, but not so often that you end up ignoring bugs. I suppose it's the difference between "ugh CI failed again because so-and-so timed out, let's just restart it" and "ugh this test fails a lot, I should look into that". It's a tough balance to strike though.

@eirslett
Copy link

Was there any progress on this issue? I'm trying to run selenium tests with ava, and the lack of retry support is painful. (browser integration tests are the most unstable tests...)
I'm thinking mocha is maybe a better test runner than ava for selenium, since mocha has retry support?

From the thread, it looks like everybody agrees that retry support would be good, but a lot of ideas about how it should be implemented so it covers every use case under the sun... in the spirit of "don't let perfect be the enemy of good" it would be useful to just have a bare-bones retry implementation similar to mocha's. Or provide the opportunity for end users to implement their own custom retry logic, like TestNG provides a RetryAnalyzer interface to implement?

t.retry(3) <-- simple version

t.retry(data => data.failureCount < parseInt(process.env.RETRY_COUNT)) <-- tailored

@sholladay
Copy link

sholladay commented Oct 16, 2017

@eirslett right now, Intern is the best test framework for WebDriver tests. They have put a ton of effort into making them more reliable with Leadfoot and DigDug (both ship with Intern). I have sponsored some features over there, through SitePen, for a big project that needed extensive WebDriver tests. And I, too, wanted retry because we had flaky tests. Turns out wallpapering over them isn't such a good idea. 😃

If retries were to be added to AVA, I definitely think t.flaky() is better by a long shot. It localizes the effect. And more importantly, it would better equip AVA to help you deal with flakiness.

t.flaky() could be much more interesting than a simple retry.

  • Could keep statistics about flakiness and report about them.
  • Could temporarily set DEBUG env var and increase log level.
  • Could log a reminder to remove .flaky() if it doesn't fail for a long time.
  • Could be smart about which errors make sense to retry.

In short, I don't think t.flaky() should have the goal of getting your test to pass. Quite the contrary, I think it should help you figure out when and how it fails. In fact, perhaps t.flaky() should intentionally run your operation dozens or hundreds of times until it fails and then skip cleanup, or log if it never fails.

@novemberborn
Copy link
Member

Very interesting suggestions @sholladay!

At this stage we're looking for somebody willing to champion this feature. It's not a priority for the core team.

@novemberborn
Copy link
Member

#1692 contains a proposal for tests to try some assertions, and if necessary discard the results. This is pretty close to the retry behavior talked about here.

Intriguingly the retry logic can be built on top of this proposal, in a third-party module. I'm closing this issue in favor of that approach.

If you want to help out please drop by #1692. If you're writing such a third-party module let us know and we can give feedback and promote it.

Thanks all for the discussion in this thread. There's plenty of ideas for somebody to tackle.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests