Reduce lock held duration in ConcurrencyLimitingRequestThrottler #1957

jasonk000 · 2024-09-09T19:27:22Z

It might take some (small) time for callback handling when the throttler request proceeds to submission.

Before this change, the throttler proceed request will happen while holding the lock, preventing other tasks from proceeding when there is spare capacity and even preventing tasks from enqueuing until the callback completes.

By tracking the expected outcome, we can perform the callback outside of the lock. This means that request registration and submission can proceed even when a long callback is being processed.

jasonk000 · 2024-09-09T19:45:41Z

fwiw, looks like this would have a merge conflict with #1950 , but the same transform can be done and both would work together

jasonk000 · 2024-09-10T19:13:03Z

Some additional details

Before -- we can see the contention over on the right hand-side showing up as futex calls; only one thread at a time is allowed to do a LoadBalancing.. ::newQueryPlan or CqlRequestHandler::sendRequest even when there is plenty of capacity in the limiter. So, although those components are designed to operate concurrently and the CQL execution will happen in parallel, the preparation is forced to be single-threaded.

After -- this results in the free-flowing query-plan generation where multiple queries can do their own plan & submit queries completely in parallel.

tolbertam

Great catch @jasonk000 and fantastic analysis! It definitely does seem like this would cause pile requests upon handling the request in onThrottleReady.

I'm a tentative +1, assuming this gets updated after #1950 is merged. Willing to give this a quick second look.

...tax/oss/driver/internal/core/session/throttling/ConcurrencyLimitingRequestThrottlerTest.java

tolbertam

Love what you did with making the test based on waiting for the countdown latch 👍, will definitely make the test reliable. Had a few small suggestions

...tax/oss/driver/internal/core/session/throttling/ConcurrencyLimitingRequestThrottlerTest.java

tolbertam

Changes look great, thanks! Just a small suggestion to assert that threads complete.

...tax/oss/driver/internal/core/session/throttling/ConcurrencyLimitingRequestThrottlerTest.java

tolbertam · 2024-09-13T13:14:29Z

Everything looks great, thank you!

clohfink

reviewed internally +1

tolbertam · 2024-09-13T15:13:09Z

@jasonk000, we just got #1950 in, I rebased locally and there were no conflicts on this branch (does not compile though).

I went ahead and created a JIRA for this: CASSANDRA-19922

With #1950 merged, I think we just need to have signalCancel updated in the same way as the changes you made for signalTimeout.

Woud you mind rebasing your branch, making that change and then squashing all of your commits and including this in the last line of the commit commit message?

patch by Jason Koch; Reviewed by Andy Tolbert and Chris Lohfink for CASSANDRA-19922

After that I can merge it and we'll get this included in the next release! 🎉

It might take some (small) time for callback handling when the throttler request proceeds to submission. Before this change, the throttler proceed request will happen while holding the lock, preventing other tasks from proceeding when there is spare capacity and even preventing tasks from enqueuing until the callback completes. By tracking the expected outcome, we can perform the callback outside of the lock. This means that request registration and submission can proceed even when a long callback is being processed. patch by Jason Koch; Reviewed by Andy Tolbert and Chris Lohfink for CASSANDRA-19922

akhaku

+1, also reviewed internally

jasonk000 · 2024-09-13T16:30:49Z

Thank you @tolbertam , appreciate the review feedback & guidance. Positive experience. Thanks!

tolbertam · 2024-09-13T16:39:06Z

Thank you for the great fix and the tests @jasonk000! 🚀

charispav · 2024-10-21T08:45:15Z

Hi all,

While using the ConcurrencyLimitingRequestThrottler in our application (java driver 4.17.0), in case Cassandra gets overloaded and the throttling mechanism kicks in, we have a blocked thread situation:

As it is evident, the lock mechanism of throttler leads to the Vert.x IO thread (worker thread) being blocked. In general, the IO thread pool is used for possibly blocking operations. This block causes a deadlock, hence makes the entire application unresponsive.

From implementation perspective, we are using the reactive API (executeReactive method) for async query execution which in its turn uses the throttling mechanism internally.

Does the improvement for the lock mechanism in this PR also leads for the blocking issue to be fixed?
If not, how should this thread-block bug be handled? Should a separate Jira ticket be created?

adutra · 2024-10-21T10:21:44Z

@charispav this PR may reduce the symptoms you are seeing but imho won't get rid of them completely.

In fact this is explained in this chapter of the docs:

[Request throttlers] use locks internally, and depending on how many requests are being executed in parallel, the thread contention on these locks can be high: in short, if your application enforces strict lock-freedom, then these components should not be used.

If you are using the reactive API, I'd suggest that you try instead to throttle the upstream, e.g. using a token-bucket algorithm.

charispav · 2024-10-21T14:33:35Z

@adutra thanks for your prompt response.

In fact, even when Cassandra eventually returns to normal operation, the block remains leaving the application in a hanging state.

How would you explain this? Why does the blocked threads not get released afterwards?

Does the deadlock happen between our own Vert.x threads and DataStax threads?

How could we eliminate (or, at least mitigate) the issue if we prefer keeping this throttler without developing our own token-bucket-based solution? Is there any configuration option that might help?

jasonk000 · 2024-10-22T17:02:45Z

@charispav

I think the current implementation should be good except under very extreme scenarios, in which case you probably want a different design anyway.

It may be possible to develop a more-lock-free implementation, and I did prototype it for this, but it likely needs change to semantics/behaviour to go lock-free in the processing path, and I haven't personally yet seen the need to develop it.

If you suspect the driver itself has some deadlock or performance issues, the best way to proceed will be to share some stack traces and in particular the lines where the code is stalled and state of queue/counters.

tolbertam self-requested a review September 11, 2024 00:02

tolbertam mentioned this pull request Sep 11, 2024

JAVA-3149: Support request cancellation in request throttler #1950

Merged

tolbertam approved these changes Sep 11, 2024

View reviewed changes

...tax/oss/driver/internal/core/session/throttling/ConcurrencyLimitingRequestThrottlerTest.java Outdated Show resolved Hide resolved

tolbertam reviewed Sep 11, 2024

View reviewed changes

tolbertam approved these changes Sep 13, 2024

View reviewed changes

clohfink self-requested a review September 13, 2024 14:45

clohfink approved these changes Sep 13, 2024

View reviewed changes

jasonk000 force-pushed the concurrency-limit-throttler-unblock branch from 19aaa14 to aa6e5ea Compare September 13, 2024 16:27

akhaku approved these changes Sep 13, 2024

View reviewed changes

tolbertam merged commit 6d3ba47 into apache:4.x Sep 13, 2024

jasonk000 deleted the concurrency-limit-throttler-unblock branch October 22, 2024 17:02

jasonk000 mentioned this pull request Mar 3, 2025

Eliminate lock in ConcurrencyLimitingRequestThrottler #2025

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce lock held duration in ConcurrencyLimitingRequestThrottler #1957

Reduce lock held duration in ConcurrencyLimitingRequestThrottler #1957

jasonk000 commented Sep 9, 2024

jasonk000 commented Sep 9, 2024 •

edited

Loading

jasonk000 commented Sep 10, 2024 •

edited

Loading

tolbertam left a comment

tolbertam left a comment

tolbertam left a comment

tolbertam commented Sep 13, 2024

clohfink left a comment

tolbertam commented Sep 13, 2024

akhaku left a comment

jasonk000 commented Sep 13, 2024

tolbertam commented Sep 13, 2024

charispav commented Oct 21, 2024

adutra commented Oct 21, 2024

charispav commented Oct 21, 2024

jasonk000 commented Oct 22, 2024 •

edited

Loading

Reduce lock held duration in ConcurrencyLimitingRequestThrottler #1957

Reduce lock held duration in ConcurrencyLimitingRequestThrottler #1957

Conversation

jasonk000 commented Sep 9, 2024

jasonk000 commented Sep 9, 2024 • edited Loading

jasonk000 commented Sep 10, 2024 • edited Loading

tolbertam left a comment

Choose a reason for hiding this comment

tolbertam left a comment

Choose a reason for hiding this comment

tolbertam left a comment

Choose a reason for hiding this comment

tolbertam commented Sep 13, 2024

clohfink left a comment

Choose a reason for hiding this comment

tolbertam commented Sep 13, 2024

akhaku left a comment

Choose a reason for hiding this comment

jasonk000 commented Sep 13, 2024

tolbertam commented Sep 13, 2024

charispav commented Oct 21, 2024

adutra commented Oct 21, 2024

charispav commented Oct 21, 2024

jasonk000 commented Oct 22, 2024 • edited Loading

jasonk000 commented Sep 9, 2024 •

edited

Loading

jasonk000 commented Sep 10, 2024 •

edited

Loading

jasonk000 commented Oct 22, 2024 •

edited

Loading