Skip to content

minor: pin to async-std 1.5.x #187

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 28, 2020
Merged

minor: pin to async-std 1.5.x #187

merged 1 commit into from
May 28, 2020

Conversation

saghm
Copy link
Contributor

@saghm saghm commented May 28, 2020

No description provided.

@saghm saghm marked this pull request as ready for review May 28, 2020 17:05
@saghm saghm merged commit dfd951f into mongodb:master May 28, 2020
@saghm saghm deleted the async-std-1.5 branch May 28, 2020 17:08
@yoshuawuyts
Copy link

@saghm quickly checking in: it seems like you're experiencing issues with async-std 1.6.0 -- could you maybe share more? If there's anything we can do on our end I'd love to help!

@saghm
Copy link
Contributor Author

saghm commented May 28, 2020

Hi @yoshuawuyts! Yeah, our CI did run into some failures with 1.6.0. The first issue was due to async-rs/async-std#795, which was happening because our logic to parse the connection string of the database from an environmental variable in the tests was async due to potential SRV lookups. We stored this in a lazy_static! variable, so we were using block_on due to not being able to use await. I imagine there's some workaround for this out there, but since we use separate connection strings for our SRV tests, we just fixed that by just not parsing SRV connection strings for the test variable.

The next issue we ran into was uncovering a potential race in one of our sessions tests; when we create a client for usage in tests, we immediately perform a couple of operations to store some info about the database (which we can use to skip tests for features that the database version doesn't support). Since we added implicit sessions support recently, each operation by default will checkout out a session from the pool, use that for the operation, and then immediately return it. We return sessions to the pool by having the Drop implementation on sessions spawn an task that checks it back in (which is needed since we can't acquire an async mutex in the drop method itself). In the past, these always got checked back in fairly quickly, and since the session pool is LIFO, these sessions didn't interfere with any of our tests that created their own sessions for usage. However, with 1.6.0, it appears that the tasks spawned in our drop methods were often taking a bit too long; we have a test that creates two sessions, drops one, waits a bit, drops the other, waits a bit, and then checks out twice and asserts on the session ids to ensure that they were checked out in reverse of the check in order. Because the sessions from the operations from the client creation boilerplate were taking a bit longer to be checked back in, this ended up sometimes causing one of them to be returned after one of the two sessions checked out in the test itself, making the assertion on the two session ids checked out the end fail. We fixed this by using an explicit session in our client creation boilerplate and marking it as dirty so that it would never get checked back into the pool, but it was a bit surprising to us that the background tasks in the drop methods were taking so much longer to complete, as we never had any failures for this test in 1.5.0.

I put the above two fixes in #182, but we started getting periodic test timeouts in CI that hadn't occurred before, such as this one (raw text log is here, although it's quite long). The relevant output from the tests here this:

[2020/05/27 23:40:39.350] running 3 tests
[2020/05/27 23:40:39.350] test sync::test::collection ... test sync::test::collection has been running for over 60 seconds
[2020/05/27 23:40:39.350] test sync::test::client ... test sync::test::client has been running for over 60 seconds
[2020/05/27 23:40:39.350] test sync::test::database ... test sync::test::database has been running for over 60 seconds
[2020/05/28 00:30:55.956] Command stopped early: context canceled

I haven't had time to look into what's causing these timeouts, as I couldn't immediately reproduce them locally, and they only happen on a few tasks of each CI run, so for now, we're just pinning to 1.5.x while we're finishing up the last few tickets we need for our 1.0 release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants