|
| 1 | +# MSC3030: Jump to date API endpoint |
| 2 | + |
| 3 | +Add an API that makes it easy to find the closest messages for a given |
| 4 | +timestamp. |
| 5 | + |
| 6 | +The goal of this change is to have clients be able to implement a jump to date |
| 7 | +feature in order to see messages back at a given point in time. Pick a date from |
| 8 | +a calender, heatmap, or paginate next/previous between days and view all of the |
| 9 | +messages that were sent on that date. |
| 10 | + |
| 11 | +Alongside the [roadmap of feature parity with |
| 12 | +Gitter](https://github.com/vector-im/roadmap/issues/26), we're also interested |
| 13 | +in using this for a new better static Matrix archive. Our idea is to server-side |
| 14 | +render [Hydrogen](https://github.com/vector-im/hydrogen-web) and this new |
| 15 | +endpoint would allow us to jump back on the fly without having to paginate and |
| 16 | +keep track of everything in order to display the selected date. |
| 17 | + |
| 18 | +Also useful for archiving and backup use cases. This new endpoint can be used to |
| 19 | +slice the messages by day and persist to file. |
| 20 | + |
| 21 | +Related issue: [*URL for an arbitrary day of history and navigation for next and |
| 22 | +previous days* |
| 23 | +(vector-im/element-web#7677)](https://github.com/vector-im/element-web/issues/7677) |
| 24 | + |
| 25 | + |
| 26 | +## Problem |
| 27 | + |
| 28 | +These types of use cases are not supported by the current Matrix API because it |
| 29 | +has no way to fetch or filter older messages besides a manual brute force |
| 30 | +pagination from the most recent event in the room. Paginating is time-consuming |
| 31 | +and expensive to process every event as you go (not practical for clients). |
| 32 | +Imagine wanting to get a message from 3 years ago 😫 |
| 33 | + |
| 34 | + |
| 35 | +## Proposal |
| 36 | + |
| 37 | +Add new client API endpoint `GET |
| 38 | +/_matrix/client/v1/rooms/{roomId}/timestamp_to_event?ts=<timestamp>&dir=[f|b]` |
| 39 | +which fetches the closest `event_id` to the given timestamp `ts` query parameter |
| 40 | +in the direction specified by the `dir` query parameter. The direction `dir` |
| 41 | +query parameter accepts `f` for forward-in-time from the timestamp and `b` for |
| 42 | +backward-in-time from the timestamp. This endpoint also returns |
| 43 | +`origin_server_ts` to make it easy to do a quick comparison to see if the |
| 44 | +`event_id` fetched is too far out of range to be useful for your use case. |
| 45 | + |
| 46 | +When an event can't be found in the given direction, the endpoint throws a 404 |
| 47 | +`"errcode":"M_NOT_FOUND",` (example error message `"error":"Unable to find event |
| 48 | +from 1672531200000 in direction f"`). |
| 49 | + |
| 50 | +In order to solve the problem where a homeserver does not have all of the history in a |
| 51 | +room and no suitably close event, we also add a server API endpoint `GET |
| 52 | +/_matrix/federation/v1/timestamp_to_event/{roomId}?ts=<timestamp>?dir=[f|b]` which other |
| 53 | +homeservers can use to ask about their closest `event_id` to the timestamp. This |
| 54 | +endpoint also returns `origin_server_ts` to make it easy to do a quick comparison to see |
| 55 | +if the remote `event_id` fetched is closer than the local one. After the local |
| 56 | +homeserver receives a response from the federation endpoint, it probably should |
| 57 | +try to backfill this event via the federation `/event/<event_id>` endpoint so that it's |
| 58 | +available to query with `/context` from a client in order to get a pagination token. |
| 59 | + |
| 60 | +The heuristics for deciding when to ask another homeserver for a closer event if |
| 61 | +your homeserver doesn't have something close, are left up to the homeserver |
| 62 | +implementation, although the heuristics will probably be based on whether the |
| 63 | +closest event is a forward/backward extremity indicating it's next to a gap of |
| 64 | +events which are potentially closer. |
| 65 | + |
| 66 | +A good heuristic for which servers to try first is to sort by servers that have |
| 67 | +been in the room the longest because they're most likely to have anything we ask |
| 68 | +about. |
| 69 | + |
| 70 | +These endpoints are authenticated and should be rate-limited like similar client |
| 71 | +and federation endpoints to prevent resource exhaustion abuse. |
| 72 | + |
| 73 | +``` |
| 74 | +GET /_matrix/client/v1/rooms/<roomID>/timestamp_to_event?ts=<timestamp>&dir=<direction> |
| 75 | +{ |
| 76 | + "event_id": ... |
| 77 | + "origin_server_ts": ... |
| 78 | +} |
| 79 | +``` |
| 80 | + |
| 81 | +Federation API endpoint: |
| 82 | +``` |
| 83 | +GET /_matrix/federation/v1/timestamp_to_event/<roomID>?ts=<timestamp>&dir=<direction> |
| 84 | +{ |
| 85 | + "event_id": ... |
| 86 | + "origin_server_ts": ... |
| 87 | +} |
| 88 | +``` |
| 89 | + |
| 90 | +--- |
| 91 | + |
| 92 | +In order to paginate `/messages`, we need a pagination token which we can get |
| 93 | +using `GET /_matrix/client/r0/rooms/{roomId}/context/{eventId}?limit=0` for the |
| 94 | +`event_id` returned by `/timestamp_to_event`. |
| 95 | + |
| 96 | +We can always iterate on `/timestamp_to_event` later and return a pagination |
| 97 | +token directly in another MSC ⏩ |
| 98 | + |
| 99 | + |
| 100 | +## Potential issues |
| 101 | + |
| 102 | +### Receiving a rogue random delayed event ID |
| 103 | + |
| 104 | +Since `origin_server_ts` is not enforcably accurate, we can only hope that an event's |
| 105 | +`origin_server_ts` is relevant enough to its `prev_events` and descendants. |
| 106 | + |
| 107 | +If you ask for "the message with `origin_server_ts` closest to Jan 1st 2018" you |
| 108 | +might actually get a rogue random delayed one that was backfilled from a |
| 109 | +federated server, but the human can figure that out by trying again with a |
| 110 | +slight variation on the date or something. |
| 111 | + |
| 112 | +Since there isn't a good or fool-proof way to combat this, it's probably best to just go |
| 113 | +with `origin_server_ts` and not let perfect be the enemy of good. |
| 114 | + |
| 115 | + |
| 116 | +### Receiving an unrenderable event ID |
| 117 | + |
| 118 | +Another issue is that clients could land on an event they can't/won't render, |
| 119 | +such as a reaction, then they'll be forced to desperately seek around the |
| 120 | +timeline until they find an event they can do something with. |
| 121 | + |
| 122 | +Eg: |
| 123 | + - Client wants to jump to January 1st, 2022 |
| 124 | + - Server says there's an event on January 2nd, 2022 that is close enough |
| 125 | + - Client finds out there's a ton of unrenderable events like memberships, poll responses, reactions, etc at that time |
| 126 | + - Client starts paginating forwards, finally finding an event on January 27th it can render |
| 127 | + - Client wasn't aware that the actual nearest neighbouring event was backwards on December 28th, 2021 because it didn't paginate in that direction |
| 128 | + - User is confused that they are a month past the target date when the message is *right there*. |
| 129 | + |
| 130 | +Clients can be smarter here though. Clients can see when events were sent as |
| 131 | +they paginate and if they see they're going more than a couple days out, they |
| 132 | +can also try the other direction before going further and further away. |
| 133 | + |
| 134 | +Clients can also just explain to the user what happened with a little toast: "We |
| 135 | +were unable to find an event to display on January 1st, 2022. The closest event |
| 136 | +after that date is on January 27th." |
| 137 | + |
| 138 | + |
| 139 | +### Abusing the `/timestamp_to_event` API to get the `m.room.create` event |
| 140 | + |
| 141 | +Although it's possible to jump to the start of the room and get the first event in the |
| 142 | +room (`m.room.create`) with `/timestamp_to_event?dir=f&ts=0`, clients should still use |
| 143 | +`GET /_matrix/client/v3/rooms/{roomId}/state/m.room.create/` to get the room creation |
| 144 | +event. |
| 145 | + |
| 146 | +In the future, with things like importing history via |
| 147 | +[MSC2716](https://github.com/matrix-org/matrix-spec-proposals/pull/2716), the first |
| 148 | +event you encounter with `/timestamp_to_event?dir=f&ts=0` could be an imported event before |
| 149 | +the room was created. |
| 150 | + |
| 151 | + |
| 152 | +## Alternatives |
| 153 | + |
| 154 | +We chose the current `/timestamp_to_event` route because it sounded like the |
| 155 | +easist path forward to bring it to fruition and get some real-world experience. |
| 156 | +And was on our mind during the [initial discussion](https://docs.google.com/document/d/1KCEmpnGr4J-I8EeaVQ8QJZKBDu53ViI7V62y5BzfXr0/edit#bookmark=id.qu9k9wje9pxm) because there was some prior art with a [WIP |
| 157 | +implementation](https://github.com/matrix-org/synapse/pull/9445/commits/91b1b3606c9fb9eede0a6963bc42dfb70635449f) |
| 158 | +from @erikjohnston. The alternatives haven't been thrown out for a particular |
| 159 | +reason and we could still go down those routes depending on how people like the |
| 160 | +current design. |
| 161 | + |
| 162 | + |
| 163 | +### Paginate `/messages?around=<timestamp>` from timestamp |
| 164 | + |
| 165 | +Add the `?around=<timestamp>` query parameter to the `GET |
| 166 | +/_matrix/client/r0/rooms/{roomId}/messages` endpoint. This will start the |
| 167 | +response at the message with `origin_server_ts` closest to the provided `around` |
| 168 | +timestamp. The direction is determined by the existing `?dir` query parameter. |
| 169 | + |
| 170 | +Use topological ordering, just as Element would use if you follow a permalink. |
| 171 | + |
| 172 | +This alternative could be confusing to the end-user around how this plays with |
| 173 | +the existing query parameters |
| 174 | +`/messages?from={paginationToken}&to={paginationToken}` which also determine |
| 175 | +what part of the timeline to query. Those parameters could be extended to accept |
| 176 | +timestamps in addition to pagination tokens but then could get confusing again |
| 177 | +when you start mixing timestamps and pagination tokens. The homeserver also has |
| 178 | +to disambiguate what a pagination token looks like vs a unix timestamp. Since |
| 179 | +pagination tokens don't follow a certain convention, some homeserver |
| 180 | +implementations may already be using arbitrary number tokens already which would |
| 181 | +be impossible to distinguish from a timestamp. |
| 182 | + |
| 183 | +A related alternative is to use `/messages` with a `from_time`/`to_time` (or |
| 184 | +`from_ts`/`to_ts`) query parameters that only accept timestamps which solves the |
| 185 | +confusion and disambigution problem of trying to re-use the existing `from`/`to` |
| 186 | +query paramters. Re-using `/messages` would reduce the number of round-trips and |
| 187 | +potentially client-side implementations for the use case where you want to fetch |
| 188 | +a window of messages from a given time. But has the same round-trip problem if |
| 189 | +you want to use the returned `event_id` with `/context` or another endpoint |
| 190 | +instead. |
| 191 | + |
| 192 | + |
| 193 | +### Filter by date in `RoomEventFilter` |
| 194 | + |
| 195 | +Extend `RoomEventFilter` to be able to specify a timestamp or a date range. The |
| 196 | +`RoomEventFilter` can be passed via the `?filter` query param on the `/messages` |
| 197 | +endpoint. |
| 198 | + |
| 199 | +This suffers from the same confusion to the end-user of how it plays with how |
| 200 | +this plays with `/messages?from={paginationToken}&to={paginationToken}` which |
| 201 | +also determines what part of the timeline to query. |
| 202 | + |
| 203 | + |
| 204 | +### Return the closest event in any direction |
| 205 | + |
| 206 | +We considered omitting the `dir` parameter (or allowing `dir=c`) to have the server |
| 207 | +return the closest event to the timestamp, regardless of direction. However, this seems |
| 208 | +to offer little benefit. |
| 209 | + |
| 210 | +Firstly, for some usecases (such as archive viewing, where we want to show all the |
| 211 | +messages that happened on a particular day), an explicit direction is important, so this |
| 212 | +would have to be optional behaviour. |
| 213 | + |
| 214 | +For a regular messaging client, "directionless" search also offers little benefit: it is |
| 215 | +easy for the client to repeat the request in the other direction if the returned event |
| 216 | +is "too far away", and in any case it needs to manage an iterative search to handle |
| 217 | +unrenderable events, as discussed above. |
| 218 | + |
| 219 | +Implementing a directionless search on the server carries a performance overhead, since |
| 220 | +it must search both forwards and backwards on every request. In short, there is little |
| 221 | +reason to expect that a single `dir=c` request would be any more efficient than a pair of |
| 222 | +requests with `dir=b` and `dir=f`. |
| 223 | + |
| 224 | +### New `destination_server_ts` field |
| 225 | + |
| 226 | +Add a new field and index on messages called `destination_server_ts` which |
| 227 | +indicates when the message was received from federation. This gives a more |
| 228 | +"real" time for how someone would actually consume those messages. |
| 229 | + |
| 230 | +The contract of the API is "show me messages my server received at time T" |
| 231 | +rather than the messy confusion of showing a delayed message which happened to |
| 232 | +originally be sent at time T. |
| 233 | + |
| 234 | +We've decided against this approach because the backfill from federated servers |
| 235 | +could be horribly late. |
| 236 | + |
| 237 | +--- |
| 238 | + |
| 239 | +Related issue around `/sync` vs `/messages`, |
| 240 | +https://github.com/matrix-org/synapse/issues/7164 |
| 241 | + |
| 242 | +> Sync returns things in the order they arrive at the server; backfill returns |
| 243 | +> them in the order determined by the event graph. |
| 244 | +> |
| 245 | +> *-- @richvdh, https://github.com/matrix-org/synapse/issues/7164#issuecomment-605877176* |
| 246 | +
|
| 247 | +> The general idea is that, if you're following a room in real-time (ie, |
| 248 | +> `/sync`), you probably want to see the messages as they arrive at your server, |
| 249 | +> rather than skipping any that arrived late; whereas if you're looking at a |
| 250 | +> historical section of timeline (ie, `/messages`), you want to see the best |
| 251 | +> representation of the state of the room as others were seeing it at the time. |
| 252 | +> |
| 253 | +> *-- @richvdh , https://github.com/matrix-org/synapse/issues/7164#issuecomment-605953296* |
| 254 | +
|
| 255 | + |
| 256 | +## Security considerations |
| 257 | + |
| 258 | +We're only going to expose messages according to the existing message history |
| 259 | +setting in the room (`m.room.history_visibility`). No extra data is exposed, |
| 260 | +just a new way to sort through it all. |
| 261 | + |
| 262 | + |
| 263 | + |
| 264 | +## Unstable prefix |
| 265 | + |
| 266 | +While this MSC is not considered stable, the endpoints are available at `/unstable/org.matrix.msc3030` instead of their `/v1` description from above. |
| 267 | + |
| 268 | +``` |
| 269 | +GET /_matrix/client/unstable/org.matrix.msc3030/rooms/<roomID>/timestamp_to_event?ts=<timestamp>&dir=<direction> |
| 270 | +{ |
| 271 | + "event_id": ... |
| 272 | + "origin_server_ts": ... |
| 273 | +} |
| 274 | +``` |
| 275 | + |
| 276 | +``` |
| 277 | +GET /_matrix/federation/unstable/org.matrix.msc3030/timestamp_to_event/<roomID>?ts=<timestamp>&dir=<direction> |
| 278 | +{ |
| 279 | + "event_id": ... |
| 280 | + "origin_server_ts": ... |
| 281 | +} |
| 282 | +``` |
| 283 | + |
| 284 | +Servers will indicate support for the new endpoint via a non-empty value for feature flag |
| 285 | +`org.matrix.msc3030` in `unstable_features` in the response to `GET |
| 286 | +/_matrix/client/versions`. |
0 commit comments