Description
Problem context
Rooms are a directed acyclic graph. Events contain a depth
field which gives an approximate ordering. The depth
can lie e.g for the events A -> B -> C as nothing stops C saying depth: 11
even though the depth is actually 3
(starting at 1
with A). Basically, we should not use depth
provided by servers.
However, we do. We rely on the depth
to work out topological tokens when backfilling via the CSAPI call /messages
. If two or more events share the same depth
value, we tiebreak based on the stream position (which is just a monotonically increasing integer). This is "okay", but allows attackers to inject events into any point in room history by fiddling with the depth
. This doesn't affect state, just the timeline sent to clients.
Proposed solution
We need to calculate depth
for every event we receive based on prev_events
. We may also need to retrospectively update the depth of events when we get told about new events.
The naive solution would be to recalculate the depth for every single event when a new event arrives. This takes θ(n+m)
time per edge where n=nodes and m=edges. To insert m edges takes θ(m^2)
time. There are algorithms which perform much better than this, notably Katriel–Bodlaender which:
runs in time O(mk log^(2)n), where k is the treewidth of the input DAG
We should implement this in GMSL or even as a stand-alone library with an interface for the ordinal data structure:
The ord labels of the nodes are maintained by an
Ordered List data structure ORD, which is a data structure that allows to maintain
a total order over a list of items and to perform the following operations in constant
amortized time [Dietz and Sleator 1987; Bender et al. 2002]: InsertAfter(x, y)
(InsertBefore(x, y)) inserts the item x immediately after (before) the item y in the
total order, Delete(x) removes the item x, the query Order(x, y) determines whether
x precedes y or y precedes x in the total order and Next(x)(Prev(x)) returns the
item that appears immediately after (before) x in the total order. These operations
are implemented by associating integer labels with the items in the list such that
the label associated with x is smaller than the label associated with y iff x precedes
y in the total order.
We should then make a separate component in Dendrite which consumes the roomserver output log and calculates depths for events, and then sends them on (or alternatively, keep the syncapi
in charge of doing this).
NB: We still have the outstanding problem of what to do if the depths change and hence the linearisation of the DAG changes.
NBB: We still need to handle chunks of DAG (e.g joined the room, got some events, leave the room for a while, then rejoin). The above algorithm would need to be applied to each chunk independently, with book-keeping for a chunk along the lines of (from @erikjohnston ):
- Track "chunks" of events, i.e. connected sets of events in a room. (You might have multiple chunks if you leave and rejoin the room for example)
- Track which chunk is "live", i.e. what chunk is currently being updated (this is a bit hand wavey how you do this)
- Within a chunk use Katriel–Bodlaender to keep a topological ordering that can be efficiently updated when new events get added to a chunk
- Use that ordering when back paginating through a chunk
- Decided what to do when you get to the end of the current chunk, either stop or choose another chunk to start paginating through