Skip to content

Commit 8e46b7c

Browse files
MadLittleModsturt2liverichvdh
committed
MSC3030: Jump to date API endpoint (#3030)
* Initial MSC draft for jump to date * Update with alternate /timestamp_to_event endpoint * Add origin_server_ts for quick remote to local comparison As discussed at matrix-org/synapse#9445 (comment) * Add origin_server_ts to client endpoint * Wrap lines * Use stable when discussing MSC and document unstable * Describe the direction parameter * Add server support detection * Fix typos * Explain what happens when an event can't be found Fix #3030 (comment) * Add context behind why we chose /timestamp_to_event vs alternatives Fix #3030 (comment) * Add comments about authentication and rate-limiting Fix #3030 (comment) * Return pagination token directly in future iteration See #3030 (comment) * Abuse /timestamp_to_event to get create event As suggested by @turt2live, #3030 (comment) * Unrenderable events As proposed by @turt2live, #3030 (comment) * Add some complication thoughts around alternatives Context: #3030 (comment) * Backfill event so we can get pagination token See #3030 (comment) * Heuristic for which server to try first See #3030 (comment) * Give a suggestion on where to backfill from See #3030 (comment) * Add alternative suggestion from @alphapapa See #3030 (comment) * Better wording and fix typo Co-authored-by: Travis Ralston <[email protected]> * No difference in homeservers See #3030 (comment) * Fix typos Co-authored-by: Richard van der Hoff <[email protected]> * Fix extra word typo * Summarizing discussion around why `dir` instead of closest See #3030 (comment) * Adjust to just suggest the right way See #3030 (comment) * Great simplification with the same meaning 🌟 Co-authored-by: Richard van der Hoff <[email protected]> * Perfect is the enemy of good See #3030 (comment) Co-authored-by: Travis Ralston <[email protected]> Co-authored-by: Richard van der Hoff <[email protected]>
1 parent 7e91b8e commit 8e46b7c

File tree

1 file changed

+286
-0
lines changed

1 file changed

+286
-0
lines changed

proposals/3030-jump-to-date.md

+286
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,286 @@
1+
# MSC3030: Jump to date API endpoint
2+
3+
Add an API that makes it easy to find the closest messages for a given
4+
timestamp.
5+
6+
The goal of this change is to have clients be able to implement a jump to date
7+
feature in order to see messages back at a given point in time. Pick a date from
8+
a calender, heatmap, or paginate next/previous between days and view all of the
9+
messages that were sent on that date.
10+
11+
Alongside the [roadmap of feature parity with
12+
Gitter](https://github.com/vector-im/roadmap/issues/26), we're also interested
13+
in using this for a new better static Matrix archive. Our idea is to server-side
14+
render [Hydrogen](https://github.com/vector-im/hydrogen-web) and this new
15+
endpoint would allow us to jump back on the fly without having to paginate and
16+
keep track of everything in order to display the selected date.
17+
18+
Also useful for archiving and backup use cases. This new endpoint can be used to
19+
slice the messages by day and persist to file.
20+
21+
Related issue: [*URL for an arbitrary day of history and navigation for next and
22+
previous days*
23+
(vector-im/element-web#7677)](https://github.com/vector-im/element-web/issues/7677)
24+
25+
26+
## Problem
27+
28+
These types of use cases are not supported by the current Matrix API because it
29+
has no way to fetch or filter older messages besides a manual brute force
30+
pagination from the most recent event in the room. Paginating is time-consuming
31+
and expensive to process every event as you go (not practical for clients).
32+
Imagine wanting to get a message from 3 years ago 😫
33+
34+
35+
## Proposal
36+
37+
Add new client API endpoint `GET
38+
/_matrix/client/v1/rooms/{roomId}/timestamp_to_event?ts=<timestamp>&dir=[f|b]`
39+
which fetches the closest `event_id` to the given timestamp `ts` query parameter
40+
in the direction specified by the `dir` query parameter. The direction `dir`
41+
query parameter accepts `f` for forward-in-time from the timestamp and `b` for
42+
backward-in-time from the timestamp. This endpoint also returns
43+
`origin_server_ts` to make it easy to do a quick comparison to see if the
44+
`event_id` fetched is too far out of range to be useful for your use case.
45+
46+
When an event can't be found in the given direction, the endpoint throws a 404
47+
`"errcode":"M_NOT_FOUND",` (example error message `"error":"Unable to find event
48+
from 1672531200000 in direction f"`).
49+
50+
In order to solve the problem where a homeserver does not have all of the history in a
51+
room and no suitably close event, we also add a server API endpoint `GET
52+
/_matrix/federation/v1/timestamp_to_event/{roomId}?ts=<timestamp>?dir=[f|b]` which other
53+
homeservers can use to ask about their closest `event_id` to the timestamp. This
54+
endpoint also returns `origin_server_ts` to make it easy to do a quick comparison to see
55+
if the remote `event_id` fetched is closer than the local one. After the local
56+
homeserver receives a response from the federation endpoint, it probably should
57+
try to backfill this event via the federation `/event/<event_id>` endpoint so that it's
58+
available to query with `/context` from a client in order to get a pagination token.
59+
60+
The heuristics for deciding when to ask another homeserver for a closer event if
61+
your homeserver doesn't have something close, are left up to the homeserver
62+
implementation, although the heuristics will probably be based on whether the
63+
closest event is a forward/backward extremity indicating it's next to a gap of
64+
events which are potentially closer.
65+
66+
A good heuristic for which servers to try first is to sort by servers that have
67+
been in the room the longest because they're most likely to have anything we ask
68+
about.
69+
70+
These endpoints are authenticated and should be rate-limited like similar client
71+
and federation endpoints to prevent resource exhaustion abuse.
72+
73+
```
74+
GET /_matrix/client/v1/rooms/<roomID>/timestamp_to_event?ts=<timestamp>&dir=<direction>
75+
{
76+
"event_id": ...
77+
"origin_server_ts": ...
78+
}
79+
```
80+
81+
Federation API endpoint:
82+
```
83+
GET /_matrix/federation/v1/timestamp_to_event/<roomID>?ts=<timestamp>&dir=<direction>
84+
{
85+
"event_id": ...
86+
"origin_server_ts": ...
87+
}
88+
```
89+
90+
---
91+
92+
In order to paginate `/messages`, we need a pagination token which we can get
93+
using `GET /_matrix/client/r0/rooms/{roomId}/context/{eventId}?limit=0` for the
94+
`event_id` returned by `/timestamp_to_event`.
95+
96+
We can always iterate on `/timestamp_to_event` later and return a pagination
97+
token directly in another MSC ⏩
98+
99+
100+
## Potential issues
101+
102+
### Receiving a rogue random delayed event ID
103+
104+
Since `origin_server_ts` is not enforcably accurate, we can only hope that an event's
105+
`origin_server_ts` is relevant enough to its `prev_events` and descendants.
106+
107+
If you ask for "the message with `origin_server_ts` closest to Jan 1st 2018" you
108+
might actually get a rogue random delayed one that was backfilled from a
109+
federated server, but the human can figure that out by trying again with a
110+
slight variation on the date or something.
111+
112+
Since there isn't a good or fool-proof way to combat this, it's probably best to just go
113+
with `origin_server_ts` and not let perfect be the enemy of good.
114+
115+
116+
### Receiving an unrenderable event ID
117+
118+
Another issue is that clients could land on an event they can't/won't render,
119+
such as a reaction, then they'll be forced to desperately seek around the
120+
timeline until they find an event they can do something with.
121+
122+
Eg:
123+
- Client wants to jump to January 1st, 2022
124+
- Server says there's an event on January 2nd, 2022 that is close enough
125+
- Client finds out there's a ton of unrenderable events like memberships, poll responses, reactions, etc at that time
126+
- Client starts paginating forwards, finally finding an event on January 27th it can render
127+
- Client wasn't aware that the actual nearest neighbouring event was backwards on December 28th, 2021 because it didn't paginate in that direction
128+
- User is confused that they are a month past the target date when the message is *right there*.
129+
130+
Clients can be smarter here though. Clients can see when events were sent as
131+
they paginate and if they see they're going more than a couple days out, they
132+
can also try the other direction before going further and further away.
133+
134+
Clients can also just explain to the user what happened with a little toast: "We
135+
were unable to find an event to display on January 1st, 2022. The closest event
136+
after that date is on January 27th."
137+
138+
139+
### Abusing the `/timestamp_to_event` API to get the `m.room.create` event
140+
141+
Although it's possible to jump to the start of the room and get the first event in the
142+
room (`m.room.create`) with `/timestamp_to_event?dir=f&ts=0`, clients should still use
143+
`GET /_matrix/client/v3/rooms/{roomId}/state/m.room.create/` to get the room creation
144+
event.
145+
146+
In the future, with things like importing history via
147+
[MSC2716](https://github.com/matrix-org/matrix-spec-proposals/pull/2716), the first
148+
event you encounter with `/timestamp_to_event?dir=f&ts=0` could be an imported event before
149+
the room was created.
150+
151+
152+
## Alternatives
153+
154+
We chose the current `/timestamp_to_event` route because it sounded like the
155+
easist path forward to bring it to fruition and get some real-world experience.
156+
And was on our mind during the [initial discussion](https://docs.google.com/document/d/1KCEmpnGr4J-I8EeaVQ8QJZKBDu53ViI7V62y5BzfXr0/edit#bookmark=id.qu9k9wje9pxm) because there was some prior art with a [WIP
157+
implementation](https://github.com/matrix-org/synapse/pull/9445/commits/91b1b3606c9fb9eede0a6963bc42dfb70635449f)
158+
from @erikjohnston. The alternatives haven't been thrown out for a particular
159+
reason and we could still go down those routes depending on how people like the
160+
current design.
161+
162+
163+
### Paginate `/messages?around=<timestamp>` from timestamp
164+
165+
Add the `?around=<timestamp>` query parameter to the `GET
166+
/_matrix/client/r0/rooms/{roomId}/messages` endpoint. This will start the
167+
response at the message with `origin_server_ts` closest to the provided `around`
168+
timestamp. The direction is determined by the existing `?dir` query parameter.
169+
170+
Use topological ordering, just as Element would use if you follow a permalink.
171+
172+
This alternative could be confusing to the end-user around how this plays with
173+
the existing query parameters
174+
`/messages?from={paginationToken}&to={paginationToken}` which also determine
175+
what part of the timeline to query. Those parameters could be extended to accept
176+
timestamps in addition to pagination tokens but then could get confusing again
177+
when you start mixing timestamps and pagination tokens. The homeserver also has
178+
to disambiguate what a pagination token looks like vs a unix timestamp. Since
179+
pagination tokens don't follow a certain convention, some homeserver
180+
implementations may already be using arbitrary number tokens already which would
181+
be impossible to distinguish from a timestamp.
182+
183+
A related alternative is to use `/messages` with a `from_time`/`to_time` (or
184+
`from_ts`/`to_ts`) query parameters that only accept timestamps which solves the
185+
confusion and disambigution problem of trying to re-use the existing `from`/`to`
186+
query paramters. Re-using `/messages` would reduce the number of round-trips and
187+
potentially client-side implementations for the use case where you want to fetch
188+
a window of messages from a given time. But has the same round-trip problem if
189+
you want to use the returned `event_id` with `/context` or another endpoint
190+
instead.
191+
192+
193+
### Filter by date in `RoomEventFilter`
194+
195+
Extend `RoomEventFilter` to be able to specify a timestamp or a date range. The
196+
`RoomEventFilter` can be passed via the `?filter` query param on the `/messages`
197+
endpoint.
198+
199+
This suffers from the same confusion to the end-user of how it plays with how
200+
this plays with `/messages?from={paginationToken}&to={paginationToken}` which
201+
also determines what part of the timeline to query.
202+
203+
204+
### Return the closest event in any direction
205+
206+
We considered omitting the `dir` parameter (or allowing `dir=c`) to have the server
207+
return the closest event to the timestamp, regardless of direction. However, this seems
208+
to offer little benefit.
209+
210+
Firstly, for some usecases (such as archive viewing, where we want to show all the
211+
messages that happened on a particular day), an explicit direction is important, so this
212+
would have to be optional behaviour.
213+
214+
For a regular messaging client, "directionless" search also offers little benefit: it is
215+
easy for the client to repeat the request in the other direction if the returned event
216+
is "too far away", and in any case it needs to manage an iterative search to handle
217+
unrenderable events, as discussed above.
218+
219+
Implementing a directionless search on the server carries a performance overhead, since
220+
it must search both forwards and backwards on every request. In short, there is little
221+
reason to expect that a single `dir=c` request would be any more efficient than a pair of
222+
requests with `dir=b` and `dir=f`.
223+
224+
### New `destination_server_ts` field
225+
226+
Add a new field and index on messages called `destination_server_ts` which
227+
indicates when the message was received from federation. This gives a more
228+
"real" time for how someone would actually consume those messages.
229+
230+
The contract of the API is "show me messages my server received at time T"
231+
rather than the messy confusion of showing a delayed message which happened to
232+
originally be sent at time T.
233+
234+
We've decided against this approach because the backfill from federated servers
235+
could be horribly late.
236+
237+
---
238+
239+
Related issue around `/sync` vs `/messages`,
240+
https://github.com/matrix-org/synapse/issues/7164
241+
242+
> Sync returns things in the order they arrive at the server; backfill returns
243+
> them in the order determined by the event graph.
244+
>
245+
> *-- @richvdh, https://github.com/matrix-org/synapse/issues/7164#issuecomment-605877176*
246+
247+
> The general idea is that, if you're following a room in real-time (ie,
248+
> `/sync`), you probably want to see the messages as they arrive at your server,
249+
> rather than skipping any that arrived late; whereas if you're looking at a
250+
> historical section of timeline (ie, `/messages`), you want to see the best
251+
> representation of the state of the room as others were seeing it at the time.
252+
>
253+
> *-- @richvdh , https://github.com/matrix-org/synapse/issues/7164#issuecomment-605953296*
254+
255+
256+
## Security considerations
257+
258+
We're only going to expose messages according to the existing message history
259+
setting in the room (`m.room.history_visibility`). No extra data is exposed,
260+
just a new way to sort through it all.
261+
262+
263+
264+
## Unstable prefix
265+
266+
While this MSC is not considered stable, the endpoints are available at `/unstable/org.matrix.msc3030` instead of their `/v1` description from above.
267+
268+
```
269+
GET /_matrix/client/unstable/org.matrix.msc3030/rooms/<roomID>/timestamp_to_event?ts=<timestamp>&dir=<direction>
270+
{
271+
"event_id": ...
272+
"origin_server_ts": ...
273+
}
274+
```
275+
276+
```
277+
GET /_matrix/federation/unstable/org.matrix.msc3030/timestamp_to_event/<roomID>?ts=<timestamp>&dir=<direction>
278+
{
279+
"event_id": ...
280+
"origin_server_ts": ...
281+
}
282+
```
283+
284+
Servers will indicate support for the new endpoint via a non-empty value for feature flag
285+
`org.matrix.msc3030` in `unstable_features` in the response to `GET
286+
/_matrix/client/versions`.

0 commit comments

Comments
 (0)