Sender Data: Background task to retry fetching sender data for megolm sessions

**Latest update:** we think we don't need a background task at all: there is only a point re-trying if we get back more information about devices or users from a `/keys/query` request, so we will simply process any retries due whenever a `/keys/query` request ends.

Launch a background task when we start that retries fetching sender data for megolm sessions.

**Update:** originally we expected this background task not to handle "slow lane" (new sessions for which we don't have data but we suspect it may appear from the next keys/query response) items, but we now expect that it will, so we need make sure that we kick off a retry cycle whenever `receive_keys_query_response` completes. (This should not clash with normal retries so we need a lock of some kind.)

Part of https://github.com/matrix-org/matrix-rust-sdk/issues/3544 which is part of [Invisible Crypto](https://github.com/matrix-org/matrix-spec-proposals/pull/4153).

The algorithm below mentions jumping to certain steps. This is referring to the algorithm in https://github.com/matrix-org/matrix-rust-sdk/issues/3543 .

## Algorithm

This handles all of the following scenarios:

* Legacy sessions - these effectively have missing device info.
    * These may already exist before this code is deployed, or be created by importing keys that are missing device info, or by restoring backups that are missing device info.
* Non-legacy sessions with missing device info - we are waiting for it to appear.
* Sessions with device info that is signed by a cross-signing key we don't recognise - we are waiting to see whether it will appear.
* Sessions that we received but were interrupted before we finished processing them.
    * (Note: we don't store the session immediately we receive it, but if we have not finished with it by the time the sync is finished, we will have stored it. If we were interrupted before the sync is finished, we will see this to-device message again the next time we sync.)

In each case we will query the store for these sessions and retry them, updating their retry count if we need to continue waiting.

The background job begins on startup and repeatedly does the following:

* Query the sessions in the store that are due for a retry because either:
    * they are in state UnknownDevice and their next_retry_time_ms property is before now
    * or they are in state DeviceInfo and their next_retry_time_ms property is before now
* Order by next_retry_time_ms ascending and take the first $BATCH_SIZE (~200?).
* For each
    * if state==UnknownDevice, [take the lock, bail out if not], jump to step C.3 (Within the algorithm above, we don't technically need to spawn additional tasks in step F.2, but it won't do any harm to do so - they probably complete immediately because we're probably not waiting for a keys/query given we just waited for one in C.3.)
    * else, state==DeviceInfo, [take the lock, bail out if not], jump to F.3
* Await all of these tasks - they will complete quickly.
* Repeat. If there were no pending jobs… go to sleep for a while

Each session gets its own independent background task, which is OK because tasks are light and they all just wait for the same keys/query to finish. After that, they may save to the store, which might cause some contention but is probably fine.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sender Data: Background task to retry fetching sender data for megolm sessions #3546

Algorithm

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Sender Data: Background task to retry fetching sender data for megolm sessions #3546

Description

Algorithm

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions