-
Notifications
You must be signed in to change notification settings - Fork 888
Reduce lock held duration in ConcurrencyLimitingRequestThrottler #1957
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce lock held duration in ConcurrencyLimitingRequestThrottler #1957
Conversation
fwiw, looks like this would have a merge conflict with #1950 , but the same transform can be done and both would work together |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch @jasonk000 and fantastic analysis! It definitely does seem like this would cause pile requests upon handling the request in onThrottleReady
.
I'm a tentative +1, assuming this gets updated after #1950 is merged. Willing to give this a quick second look.
...tax/oss/driver/internal/core/session/throttling/ConcurrencyLimitingRequestThrottlerTest.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love what you did with making the test based on waiting for the countdown latch 👍, will definitely make the test reliable. Had a few small suggestions
...tax/oss/driver/internal/core/session/throttling/ConcurrencyLimitingRequestThrottlerTest.java
Outdated
Show resolved
Hide resolved
...tax/oss/driver/internal/core/session/throttling/ConcurrencyLimitingRequestThrottlerTest.java
Show resolved
Hide resolved
...tax/oss/driver/internal/core/session/throttling/ConcurrencyLimitingRequestThrottlerTest.java
Outdated
Show resolved
Hide resolved
...tax/oss/driver/internal/core/session/throttling/ConcurrencyLimitingRequestThrottlerTest.java
Outdated
Show resolved
Hide resolved
...tax/oss/driver/internal/core/session/throttling/ConcurrencyLimitingRequestThrottlerTest.java
Show resolved
Hide resolved
...tax/oss/driver/internal/core/session/throttling/ConcurrencyLimitingRequestThrottlerTest.java
Outdated
Show resolved
Hide resolved
...tax/oss/driver/internal/core/session/throttling/ConcurrencyLimitingRequestThrottlerTest.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look great, thanks! Just a small suggestion to assert that threads complete.
...tax/oss/driver/internal/core/session/throttling/ConcurrencyLimitingRequestThrottlerTest.java
Show resolved
Hide resolved
...tax/oss/driver/internal/core/session/throttling/ConcurrencyLimitingRequestThrottlerTest.java
Show resolved
Hide resolved
...tax/oss/driver/internal/core/session/throttling/ConcurrencyLimitingRequestThrottlerTest.java
Show resolved
Hide resolved
Everything looks great, thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reviewed internally +1
@jasonk000, we just got #1950 in, I rebased locally and there were no conflicts on this branch (does not compile though). I went ahead and created a JIRA for this: CASSANDRA-19922 With #1950 merged, I think we just need to have Woud you mind rebasing your branch, making that change and then squashing all of your commits and including this in the last line of the commit commit message?
After that I can merge it and we'll get this included in the next release! 🎉 |
It might take some (small) time for callback handling when the throttler request proceeds to submission. Before this change, the throttler proceed request will happen while holding the lock, preventing other tasks from proceeding when there is spare capacity and even preventing tasks from enqueuing until the callback completes. By tracking the expected outcome, we can perform the callback outside of the lock. This means that request registration and submission can proceed even when a long callback is being processed. patch by Jason Koch; Reviewed by Andy Tolbert and Chris Lohfink for CASSANDRA-19922
19aaa14
to
aa6e5ea
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, also reviewed internally
Thank you @tolbertam , appreciate the review feedback & guidance. Positive experience. Thanks! |
Thank you for the great fix and the tests @jasonk000! 🚀 |
@charispav this PR may reduce the symptoms you are seeing but imho won't get rid of them completely. In fact this is explained in this chapter of the docs:
If you are using the reactive API, I'd suggest that you try instead to throttle the upstream, e.g. using a token-bucket algorithm. |
@adutra thanks for your prompt response. In fact, even when Cassandra eventually returns to normal operation, the block remains leaving the application in a hanging state. How would you explain this? Why does the blocked threads not get released afterwards? Does the deadlock happen between our own Vert.x threads and DataStax threads? How could we eliminate (or, at least mitigate) the issue if we prefer keeping this throttler without developing our own token-bucket-based solution? Is there any configuration option that might help? |
I think the current implementation should be good except under very extreme scenarios, in which case you probably want a different design anyway. It may be possible to develop a more-lock-free implementation, and I did prototype it for this, but it likely needs change to semantics/behaviour to go lock-free in the processing path, and I haven't personally yet seen the need to develop it. If you suspect the driver itself has some deadlock or performance issues, the best way to proceed will be to share some stack traces and in particular the lines where the code is stalled and state of queue/counters. |
It might take some (small) time for callback handling when the throttler request proceeds to submission.
Before this change, the throttler proceed request will happen while holding the lock, preventing other tasks from proceeding when there is spare capacity and even preventing tasks from enqueuing until the callback completes.
By tracking the expected outcome, we can perform the callback outside of the lock. This means that request registration and submission can proceed even when a long callback is being processed.