You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Turns out it is possible to overload datashards with reads, in a subtle way where inflight will essentially be reported as zero, while datashard's mailbox has so many events that there's no chance for each read to be processed before the timeout at the client. When mailbox is overloaded in such a way TEvReadCancel is never processed before the request completes, which means cancellation doesn't really work.
This happens because simple reads are processed in their event handler, and even the pipeline transaction is often executed directly in the event handler. Since any cancellation cannot arrive before the request, such simple requests are essentially uncancellable.
We need to make sure we never execute read requests directly, and we need to make sure we scan the actor mailbox before actually running every request, because the request may actually be cancelled.
Instead of executing the read via pipeline transaction directly, we need to always put new reads in a queue, and handle them one by one by scheduling at new event when executing the last request, similarly to how progress transaction executes immediate transactions. This way we will always scan the mailbox before each execution attempt, and it will give TEvReadCancel a chance to remove reads from the queue. We will also be able to observe queue size for reads, provided this mailbox processing is fast enough. I think we would probably want to move read keys into an event payload, so they are not parsed from protobuf when the request first arrives (this parsing may be more expensive than the read itself).
This shouldn't affect throughput, since there will be at most one extra event per read, and as long as datashard doesn't fully consume a thread it shouldn't affect latency much either (as long as preliminary request processing is very cheap, which means we should probably move as much validation as possible outside of the enqueue), since adding events to a running mailbox is very cheap. We could even resend the same event handle to avoid extra allocation and adding a flag marking "ready for processing". In case TEvReadCancel arrives first the enqueued event will then be marked as cancelled and ignored.
The text was updated successfully, but these errors were encountered:
Turns out it is possible to overload datashards with reads, in a subtle way where inflight will essentially be reported as zero, while datashard's mailbox has so many events that there's no chance for each read to be processed before the timeout at the client. When mailbox is overloaded in such a way TEvReadCancel is never processed before the request completes, which means cancellation doesn't really work.
This happens because simple reads are processed in their event handler, and even the pipeline transaction is often executed directly in the event handler. Since any cancellation cannot arrive before the request, such simple requests are essentially uncancellable.
We need to make sure we never execute read requests directly, and we need to make sure we scan the actor mailbox before actually running every request, because the request may actually be cancelled.
Instead of executing the read via pipeline transaction directly, we need to always put new reads in a queue, and handle them one by one by scheduling at new event when executing the last request, similarly to how progress transaction executes immediate transactions. This way we will always scan the mailbox before each execution attempt, and it will give TEvReadCancel a chance to remove reads from the queue. We will also be able to observe queue size for reads, provided this mailbox processing is fast enough. I think we would probably want to move read keys into an event payload, so they are not parsed from protobuf when the request first arrives (this parsing may be more expensive than the read itself).
This shouldn't affect throughput, since there will be at most one extra event per read, and as long as datashard doesn't fully consume a thread it shouldn't affect latency much either (as long as preliminary request processing is very cheap, which means we should probably move as much validation as possible outside of the enqueue), since adding events to a running mailbox is very cheap. We could even resend the same event handle to avoid extra allocation and adding a flag marking "ready for processing". In case TEvReadCancel arrives first the enqueued event will then be marked as cancelled and ignored.
The text was updated successfully, but these errors were encountered: