-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
StreamManager: retry with get result request on already exist errors #6345
StreamManager: retry with get result request on already exist errors #6345
Conversation
bafde82
to
a29fcc1
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #6345 +/- ##
=======================================
Coverage 97.84% 97.84%
=======================================
Files 1110 1110
Lines 96597 96648 +51
=======================================
+ Hits 94516 94567 +51
Misses 2081 2081 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. @verult noted that he tested this and demonstrated success using this PR locally.
@@ -319,7 +313,8 @@ def _get_retry_request_or_raise( | |||
error: quantum.StreamError, | |||
current_request, | |||
create_program_and_job_request, | |||
create_job_request: quantum.QuantumRunStreamRequest, | |||
create_job_request, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you keep the type hint here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's still there, just moved down by 1 param
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's not (typically) how type annotations work in python. They should be on each parameter
) | ||
# If the program already exists and is created as part of the stream client, the job | ||
# should also exist because they are created at the same time. | ||
# If the job is missing, a `CreateQuantumJobRequest` will be issued after a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does the program get created w/o the job if they're created together in a CreateProgramAndJobRequest? Does a closed stream kill the server-side handling?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarified a bit in the comment - it's for the unlikely case that the program is created outside StreamManager
. The logic here doesn't explicitly try to solve this case, but it just happens to do the right thing.
This PR fixes a race condition that occurred roughly every 10-15min by adding a retry with
GetQuantumResultRequest
whenStreamManager
receives a program or job already exists error. The sequence is as follows:This would cause issues when a user specifies a
program ID
orjob ID
inEngine.run_sweep()
orEngineProcessor.run_sweep()
rather than letting the client generate the ID, because there could be a real ID conflict. However, the recommended path of usingProcessorSampler.run_sweep()
does not specify IDs, and we're considering deprecating this ability to specify IDs. It's otherwise hard to discern between a real conflict vs. the race condition.This is now the error handling logic after a stream breakage:
where
and the dot indicates the starting state.
cc @senecameeks