fix(sdk): Preventing starting a new request if the previous didn't finish #2414

Hywan · 2023-08-17T08:59:11Z

Imagine the following scenario:

A request $R_1$ is sent. A response $S_1$ is received and is being
handled. In the meantime, the sync-loop is instructed to skip over any
remaining work in its iteration and to jump to the next iteration. As a
consequence, $S_1$ is detached, but continues to run. In the meantime, a
new request $R_2$ starts. Since $S_1$ has not finished to be handled,
the pos isn't updated yet, and $R_2$ starts with the same pos
as $R_1$.

The impacts are the following:

Since the pos is the same, even if some parameters are different,
the server will reply with the same response. It's a waste of time
and resources (incl. network).
Receiving the same response could have corrupt the state. It has been
fixed in feat(sdk): Skip Sliding Sync Response if it's been received already #2395
though.

Point 2 has been addressed, but point 1 remains to be addresed. This
patch fixes point 1.

How? It changes the RwLock around SlidingSyncInner::position to
a Mutex. An OwnedMutexGuard is fetched by locking the mutex when
the request is generated (i.e. when pos is read to be put in the new
request). This OwnedMutexGuard is kept during the entire lifetime
of the request extended to the response handling. It is dropped/released
when the response has been fully handled, or if any error happens along
the process.

It means that it's impossible for a new request to be generated and to
be sent if a request and response is running. It solves point 1 in case
of successful response, otherwise the pos isn't updated because of
an error.

This patch moves `SlidingSync::response_handling_lock` inside `SlidingSyncInner`. There is no reason why it's stored inside `SlidingSync`.

…nish. Imagine the following scenario: A request $R_1$ is sent. A response $S_1$ is received and is being handled. In the meantime, the sync-loop is instructed to skip over any remaining work in its iteration and to jump to the next iteration. As a consequence, $S_1$ is detached, but continues to run. In the meantime, a new request $R_2$ starts. Since $S_1$ has _not_ finished to be handled, the `pos` isn't updated yet, and $R_2$ starts with the _same_ `pos` as $R_1$. The impacts are the following: 1. Since the `pos` is the same, even if some parameters are different, the server will reply with the same response. It's a waste of time and resources (incl. network). 2. Receiving the same response could have corrupt the state. It has been fixed in matrix-org#2395 though. Point 2 has been addressed, but point 1 remains to be addresed. This patch fixes point 1. How? It changes the `RwLock` around `SlidingSyncInner::position` to a `Mutex`. An `OwnedMutexGuard` is fetched by locking the mutex when the request is generated (i.e. when `pos` is read to be put in the new request). This `OwnedMutexGuard` is kept during the entire lifetime of the request extend to the response handling. It is dropped/released when the response has been fully handled, or if any error happens along the process. It means that it's impossible for a new request to be generated and to be sent if a request and response is running. It solves point 1 in case of successful response, otherwise the `pos` isn't updated because of an error.

codecov · 2023-08-17T09:19:37Z

Codecov Report

Patch coverage: 90.90% and no project coverage change.

Comparison is base (0dac508) 77.06% compared to head (d946664) 77.06%.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #2414   +/-   ##
=======================================
  Coverage   77.06%   77.06%           
=======================================
  Files         181      181           
  Lines       19135    19139    +4     
=======================================
+ Hits        14746    14750    +4     
  Misses       4389     4389

Files Changed	Coverage Δ
crates/matrix-sdk/src/sliding_sync/mod.rs	`90.98% <88.23%> (+0.03%)`	⬆️
crates/matrix-sdk/src/sliding_sync/builder.rs	`66.95% <100.00%> (+0.88%)`	⬆️
crates/matrix-sdk/src/sliding_sync/cache.rs	`92.95% <100.00%> (ø)`

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

bnjbvr · 2023-08-17T09:15:43Z

crates/matrix-sdk/src/sliding_sync/mod.rs

+
+        // Release the position markers lock.
+        // It means that other request can start be sent.
+        drop(position_guard);


It's not necessary per se to have the drop here, since it will be dropped naturally at the end of the function, which happens on the next line. We could keep this if we wanted to keep the nice comment, though.

Yeah I know it's implicit, but I prefer to get an explicit drop here so that the whole logic is clear and documented.

bnjbvr · 2023-08-17T09:15:56Z

crates/matrix-sdk/src/sliding_sync/mod.rs

+        past_positions.push(position_guard.clone());
+
+        // Release the position markers lock.
+        // It means that other request can start be sent.


nit: missing/extra word here?

bnjbvr · 2023-08-17T09:16:29Z

crates/matrix-sdk/src/sliding_sync/mod.rs

+    ) -> Result<(
+        v4::Request,
+        RequestConfig,
+        BTreeSet<OwnedRoomId>,
+        OwnedMutexGuard<SlidingSyncPositionMarkers>,
+    )> {


Maybe it's time to return a struct with named fields instead of individual undocumented fields like this?

Agree. I was thinking addressing that in another PR, toughts?

Sounds fine to me 👍

bnjbvr

Thanks for fixing this.

Hywan added 2 commits August 17, 2023 09:16

chore(sdk): Move response_handling_lock inside SlidingSyncInner.

2718176

This patch moves `SlidingSync::response_handling_lock` inside `SlidingSyncInner`. There is no reason why it's stored inside `SlidingSync`.

Hywan requested a review from a team as a code owner August 17, 2023 08:59

Hywan requested review from bnjbvr and a team and removed request for a team August 17, 2023 08:59

bnjbvr reviewed Aug 17, 2023

View reviewed changes

bnjbvr approved these changes Aug 17, 2023

View reviewed changes

doc(sdk): Fix a typo.

d946664

Hywan merged commit 2c99810 into matrix-org:main Aug 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(sdk): Preventing starting a new request if the previous didn't finish #2414

fix(sdk): Preventing starting a new request if the previous didn't finish #2414

Uh oh!

Hywan commented Aug 17, 2023

Uh oh!

codecov bot commented Aug 17, 2023 •

edited

Loading

Uh oh!

bnjbvr Aug 17, 2023

Uh oh!

Hywan Aug 17, 2023

Uh oh!

bnjbvr Aug 17, 2023

Uh oh!

bnjbvr Aug 17, 2023

Uh oh!

Hywan Aug 17, 2023

Uh oh!

bnjbvr Aug 17, 2023

Uh oh!

bnjbvr left a comment

Uh oh!

Uh oh!

fix(sdk): Preventing starting a new request if the previous didn't finish #2414

fix(sdk): Preventing starting a new request if the previous didn't finish #2414

Uh oh!

Conversation

Hywan commented Aug 17, 2023

Uh oh!

codecov bot commented Aug 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

bnjbvr Aug 17, 2023

Choose a reason for hiding this comment

Uh oh!

Hywan Aug 17, 2023

Choose a reason for hiding this comment

Uh oh!

bnjbvr Aug 17, 2023

Choose a reason for hiding this comment

Uh oh!

bnjbvr Aug 17, 2023

Choose a reason for hiding this comment

Uh oh!

Hywan Aug 17, 2023

Choose a reason for hiding this comment

Uh oh!

bnjbvr Aug 17, 2023

Choose a reason for hiding this comment

Uh oh!

bnjbvr left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov bot commented Aug 17, 2023 •

edited

Loading