Skip to content

fix(sdk): Preventing starting a new request if the previous didn't finish #2414

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Aug 17, 2023

Conversation

Hywan
Copy link
Member

@Hywan Hywan commented Aug 17, 2023

Imagine the following scenario:

A request $R_1$ is sent. A response $S_1$ is received and is being
handled. In the meantime, the sync-loop is instructed to skip over any
remaining work in its iteration and to jump to the next iteration. As a
consequence, $S_1$ is detached, but continues to run. In the meantime, a
new request $R_2$ starts. Since $S_1$ has not finished to be handled,
the pos isn't updated yet, and $R_2$ starts with the same pos
as $R_1$.

The impacts are the following:

  1. Since the pos is the same, even if some parameters are different,
    the server will reply with the same response. It's a waste of time
    and resources (incl. network).
  2. Receiving the same response could have corrupt the state. It has been
    fixed in feat(sdk): Skip Sliding Sync Response if it's been received already #2395
    though.

Point 2 has been addressed, but point 1 remains to be addresed. This
patch fixes point 1.

How? It changes the RwLock around SlidingSyncInner::position to
a Mutex. An OwnedMutexGuard is fetched by locking the mutex when
the request is generated (i.e. when pos is read to be put in the new
request). This OwnedMutexGuard is kept during the entire lifetime
of the request extended to the response handling. It is dropped/released
when the response has been fully handled, or if any error happens along
the process.

It means that it's impossible for a new request to be generated and to
be sent if a request and response is running. It solves point 1 in case
of successful response, otherwise the pos isn't updated because of
an error.

Hywan added 2 commits August 17, 2023 09:16
This patch moves `SlidingSync::response_handling_lock` inside
`SlidingSyncInner`. There is no reason why it's stored inside
`SlidingSync`.
…nish.

Imagine the following scenario:

A request $R_1$ is sent. A response $S_1$ is received and is being
handled. In the meantime, the sync-loop is instructed to skip over any
remaining work in its iteration and to jump to the next iteration. As a
consequence, $S_1$ is detached, but continues to run. In the meantime, a
new request $R_2$ starts. Since $S_1$  has _not_ finished to be handled,
the `pos` isn't updated yet, and $R_2$ starts with the _same_ `pos`
as $R_1$.

The impacts are the following:

1. Since the `pos` is the same, even if some parameters are different,
   the server will reply with the same response. It's a waste of time
   and resources (incl. network).
2. Receiving the same response could have corrupt the state. It has been
   fixed in matrix-org#2395
   though.

Point 2 has been addressed, but point 1 remains to be addresed. This
patch fixes point 1.

How? It changes the `RwLock` around `SlidingSyncInner::position` to
a `Mutex`. An `OwnedMutexGuard` is fetched by locking the mutex when
the request is generated (i.e. when `pos` is read to be put in the new
request). This `OwnedMutexGuard` is kept during the entire lifetime
of the request extend to the response handling. It is dropped/released
when the response has been fully handled, or if any error happens along
the process.

It means that it's impossible for a new request to be generated and to
be sent if a request and response is running. It solves point 1 in case
of successful response, otherwise the `pos` isn't updated because of
an error.
@Hywan Hywan requested a review from a team as a code owner August 17, 2023 08:59
@Hywan Hywan requested review from bnjbvr and a team and removed request for a team August 17, 2023 08:59
@codecov
Copy link

codecov bot commented Aug 17, 2023

Codecov Report

Patch coverage: 90.90% and no project coverage change.

Comparison is base (0dac508) 77.06% compared to head (d946664) 77.06%.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2414   +/-   ##
=======================================
  Coverage   77.06%   77.06%           
=======================================
  Files         181      181           
  Lines       19135    19139    +4     
=======================================
+ Hits        14746    14750    +4     
  Misses       4389     4389           
Files Changed Coverage Δ
crates/matrix-sdk/src/sliding_sync/mod.rs 90.98% <88.23%> (+0.03%) ⬆️
crates/matrix-sdk/src/sliding_sync/builder.rs 66.95% <100.00%> (+0.88%) ⬆️
crates/matrix-sdk/src/sliding_sync/cache.rs 92.95% <100.00%> (ø)

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


// Release the position markers lock.
// It means that other request can start be sent.
drop(position_guard);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not necessary per se to have the drop here, since it will be dropped naturally at the end of the function, which happens on the next line. We could keep this if we wanted to keep the nice comment, though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I know it's implicit, but I prefer to get an explicit drop here so that the whole logic is clear and documented.

past_positions.push(position_guard.clone());

// Release the position markers lock.
// It means that other request can start be sent.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: missing/extra word here?

Comment on lines +447 to +452
) -> Result<(
v4::Request,
RequestConfig,
BTreeSet<OwnedRoomId>,
OwnedMutexGuard<SlidingSyncPositionMarkers>,
)> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's time to return a struct with named fields instead of individual undocumented fields like this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. I was thinking addressing that in another PR, toughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds fine to me 👍

Copy link
Member

@bnjbvr bnjbvr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this.

@Hywan Hywan merged commit 2c99810 into matrix-org:main Aug 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants