Skip to content

Fix failing Github Actions Tests #1753

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Dec 12, 2020
Merged

Conversation

TomGranot
Copy link
Contributor

Prefix the www subdomain for URLs in the testMaxTotalConnections test so it doesn't fail on DNS weirdness.

Should unblock quite a few PRs in the way, the most recent one being #1752.

Should probably investigate further / refactor (opened for that), but this blocks the thread on everything else on the repo, so I'm calling it.

… so it doesn't fail on DNS weridness. Should probably investigate further / refactor, but this blocks the thread on everything else on the repo, so I'm calling it.
@TomGranot
Copy link
Contributor Author

TomGranot commented Dec 12, 2020

Failed again on GitHub Actions, but passed on Travis (for the PR build).

GA's failed tests:

Failed tests: 
  AsyncStreamHandlerTest.asyncOptionsTest:434->lambda$asyncOptionsTest$18:435->lambda$null$17:459 expected [4] but found [5]
  MaxTotalConnectionTest.testMaxTotalConnections:108 expected [null] but found [java.util.concurrent.TimeoutException: Request timeout to www.github.com:443 after 5000 ms]
  ReactiveStreamsErrorTest.timeoutWithNoStatusLineSent:78->expectReadTimeout:276 Expected read timeout, but was java.util.concurrent.TimeoutException: Request timeout to localhost/127.0.0.1:37985 after 3000 ms expected [true] but found [false]
  TextMessageTest.onFailureTest » Test 
Expected exception of type class java.ne...

Looks like a few DNS failures and some timeout errors.

Running the entire suite of tests locally as a sanity check, no failures and runs in about 3 minutes which is ~25% of the time of the agents on Github Actions, doesn't fail on any test. Let's go one by one over the tests and see what's up:

  • For AsyncStreamHandlerTest.asyncOptionsTest - This is a re-occurence of our friend AsyncStreamHandlerTest.asyncOptionsTest() is flaky #1735. Basically the server returns 4 instead of 5 supported methods when you happen to hit one server in apache.org, and 5 supported methods when you hit another. Since the logic behind the test can now be considered compromised, I'm going to refactor it to allow for the two possible cases.

  • For MaxTotalConnectionTest.testMaxTotalConnections - Since the https://github.com domain is only used in the context of this test throughout the codebase and can be swapped for something else, I swapped it for https://www.youtube.com for testing. Looks fine locally.

  • For ReactiveStreamsErrorTest - Timeout, I'm assuming this has to do with the resource cap on GitHub actions. Bumping up the request timeouts in 2 seconds to see if the tests pass then.

  • For TextMessageTest.onFailureTest - This got truncated, but is also a DNS error. Ther's a manual, non-existing websocket address there that the DNS resolver does not like. Let's see what happens when we upgrade the version of the OS.

Also, because I feel like playing around with the underlying agent, let's see what happens when I swap the OS the agent is running on from 16.04 to 20.04. Expect shenanigans.

@TomGranot TomGranot changed the title Fix maxTotalConnectionsTest DNS Failure Fix failing Github Actions Tests Dec 12, 2020
@TomGranot TomGranot mentioned this pull request Dec 12, 2020
@TomGranot
Copy link
Contributor Author

TomGranot commented Dec 12, 2020

Old run: https://github.com/AsyncHttpClient/async-http-client/runs/1543080666?check_suite_focus=true
New run: https://github.com/AsyncHttpClient/async-http-client/runs/1543236641?check_suite_focus=true

So I didn't get a new runner version, but I did of course get a new OS version and new virtual environment versions.

And 6 failed tests this time...

Failed tests: 
  AsyncStreamHandlerTest.asyncOptionsTest:434->lambda$asyncOptionsTest$18:435->lambda$null$17:467
  ReactiveStreamsErrorTest.neverSubscribingToResponseBodyHitsRequestTimeout:85->execute:292 » Timeout
  ReactiveStreamsErrorTest.readTimeoutCancelsBodyStream:188 Request should have timed out
  ReactiveStreamsErrorTest.readTimeoutInMiddleOfBody:116->execute:292 » Timeout
  ReactiveStreamsErrorTest.requestTimeoutCancelsBodyStream:227->execute:292 » Timeout
  TextMessageTest.onFailureTest » Test 
Expected exception of type class java.ne...

Made a very stupid mistake on the asyncOptionsTest, very tired - fixing now:)

I think the timeouts on the ReactiveStreamsErrosTest are due to the way the tests rely on one another (didn't look too deeply), so I'm restoring the original timeouts to see if they don't crash on the new runner.

Not seeing the DNS resolution error on the maxTotalConnections test, which is nice.

However, the same DNS resolution error (I think) on the TextMessageTest with the non-existing websocket. Looking while it's building. Dump:

Expected exception of type class java.net.UnknownHostException but got java.net.ConnectException: fv-az102-674: DNS name not found [response code 3]
Caused by: java.net.ConnectException: fv-az102-674: DNS name not found [response code 3]
Caused by: java.net.UnknownHostException: fv-az102-674: DNS name not found [response code 3]
Caused by: java.net.UnknownHostException: DNS name not found [response code 3]

So it's expecting an UnknownHostException but getting a ConnectException. A stupid solution would be to just allow for both exceptions to be thrown since I can see the thing I expect to happen (the DNS to fail) actually happens. The fact that the exception is different in this case does not indicate a failure in logic but rather a difference in instrumentation. And actually, a better way of checking that the exception is the correct one is by querying its value and checking that it is indeed "DNS name not found", and not simply query for the type of exception IMHO.

The interesting thing to ask is why is the exception different - I am running the same JDK locally as the agent, and would assume that we would receive the same exceptions. But since this is a case of more or less granularity for a specific test case, I think that the solution I suggested above works.

Looking at the old build, the only test that's still failing seems to be this exact one - pushing this now to see if I get the all-green.

@TomGranot TomGranot merged commit a1ea371 into master Dec 12, 2020
@TomGranot TomGranot deleted the fix-maxtotalconnections-test branch December 12, 2020 20:29
TomGranot added a commit that referenced this pull request Dec 13, 2020
* Fix typos

* Fix Failing Github Actions Tests (#1753)

* Bump GitHub Actions runner version to Ubuntu 20.04

* AsyncStreamHandlerTest.asyncOptionsTest - Allow for the appearance of the TRACE method in the server's response

* MaxTotalConnectionTest.testMaxTotalConnections - https://github.com --> https://www.youtube.com, https://google.com --> https://www.google.com 

* TextMessageTest.onFailureTest - Refactor logic for more granular testing (accept ConnectException in addition to UnknownHostException, but check for "DNS Name not found" in exception cause)

Co-authored-by: Tom Granot <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant