-
Notifications
You must be signed in to change notification settings - Fork 295
fix(sdk): Preventing starting a new request if the previous didn't finish #2414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This patch moves `SlidingSync::response_handling_lock` inside `SlidingSyncInner`. There is no reason why it's stored inside `SlidingSync`.
…nish. Imagine the following scenario: A request $R_1$ is sent. A response $S_1$ is received and is being handled. In the meantime, the sync-loop is instructed to skip over any remaining work in its iteration and to jump to the next iteration. As a consequence, $S_1$ is detached, but continues to run. In the meantime, a new request $R_2$ starts. Since $S_1$ has _not_ finished to be handled, the `pos` isn't updated yet, and $R_2$ starts with the _same_ `pos` as $R_1$. The impacts are the following: 1. Since the `pos` is the same, even if some parameters are different, the server will reply with the same response. It's a waste of time and resources (incl. network). 2. Receiving the same response could have corrupt the state. It has been fixed in matrix-org#2395 though. Point 2 has been addressed, but point 1 remains to be addresed. This patch fixes point 1. How? It changes the `RwLock` around `SlidingSyncInner::position` to a `Mutex`. An `OwnedMutexGuard` is fetched by locking the mutex when the request is generated (i.e. when `pos` is read to be put in the new request). This `OwnedMutexGuard` is kept during the entire lifetime of the request extend to the response handling. It is dropped/released when the response has been fully handled, or if any error happens along the process. It means that it's impossible for a new request to be generated and to be sent if a request and response is running. It solves point 1 in case of successful response, otherwise the `pos` isn't updated because of an error.
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #2414 +/- ##
=======================================
Coverage 77.06% 77.06%
=======================================
Files 181 181
Lines 19135 19139 +4
=======================================
+ Hits 14746 14750 +4
Misses 4389 4389
☔ View full report in Codecov by Sentry. |
|
||
// Release the position markers lock. | ||
// It means that other request can start be sent. | ||
drop(position_guard); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not necessary per se to have the drop
here, since it will be dropped naturally at the end of the function, which happens on the next line. We could keep this if we wanted to keep the nice comment, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I know it's implicit, but I prefer to get an explicit drop here so that the whole logic is clear and documented.
past_positions.push(position_guard.clone()); | ||
|
||
// Release the position markers lock. | ||
// It means that other request can start be sent. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: missing/extra word here?
) -> Result<( | ||
v4::Request, | ||
RequestConfig, | ||
BTreeSet<OwnedRoomId>, | ||
OwnedMutexGuard<SlidingSyncPositionMarkers>, | ||
)> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it's time to return a struct
with named fields instead of individual undocumented fields like this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. I was thinking addressing that in another PR, toughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds fine to me 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing this.
Imagine the following scenario:
A request$R_1$ is sent. A response $S_1$ is received and is being$S_1$ is detached, but continues to run. In the meantime, a$R_2$ starts. Since $S_1$ has not finished to be handled,$R_2$ starts with the same $R_1$ .
handled. In the meantime, the sync-loop is instructed to skip over any
remaining work in its iteration and to jump to the next iteration. As a
consequence,
new request
the
pos
isn't updated yet, andpos
as
The impacts are the following:
pos
is the same, even if some parameters are different,the server will reply with the same response. It's a waste of time
and resources (incl. network).
fixed in feat(sdk): Skip Sliding Sync
Response
if it's been received already #2395though.
Point 2 has been addressed, but point 1 remains to be addresed. This
patch fixes point 1.
How? It changes the
RwLock
aroundSlidingSyncInner::position
toa
Mutex
. AnOwnedMutexGuard
is fetched by locking the mutex whenthe request is generated (i.e. when
pos
is read to be put in the newrequest). This
OwnedMutexGuard
is kept during the entire lifetimeof the request extended to the response handling. It is dropped/released
when the response has been fully handled, or if any error happens along
the process.
It means that it's impossible for a new request to be generated and to
be sent if a request and response is running. It solves point 1 in case
of successful response, otherwise the
pos
isn't updated because ofan error.