Graph linearisation

### Problem context

Rooms are a directed acyclic graph. Events contain a `depth` field which gives an approximate ordering. The `depth` can lie e.g for the events A -> B -> C as nothing stops C saying `depth: 11` even though the depth is actually `3` (starting at `1` with A). Basically, we should not use `depth` provided by servers.

However, we do. We rely on the `depth` to work out topological tokens when backfilling via the CSAPI call `/messages`. If two or more events share the same `depth` value, we tiebreak based on the stream position (which is just a monotonically increasing integer). This is "okay", but allows attackers to inject events into *any point* in room history by fiddling with the `depth`. This doesn't affect state, just the timeline sent to clients.

### Proposed solution 

We need to calculate `depth` for every event we receive based on `prev_events`. We may also need to retrospectively update the depth of events when we get told about new events.

The naive solution would be to recalculate the depth for every single event when a new event arrives. This takes `θ(n+m)` time per edge where n=nodes and m=edges. To insert m edges takes `θ(m^2)` time. There are algorithms which perform much better than this, notably [Katriel–Bodlaender](https://dl.acm.org/doi/pdf/10.1145/1159892.1159896) which:

> runs in time O(mk log^(2)n), where k is the treewidth of the input DAG

We should implement this in GMSL or even as a stand-alone library with an interface for the ordinal data structure:

> The ord labels of the nodes are maintained by an
Ordered List data structure ORD, which is a data structure that allows to maintain
a total order over a list of items and to perform the following operations in constant
amortized time [Dietz and Sleator 1987; Bender et al. 2002]: InsertAfter(x, y)
(InsertBefore(x, y)) inserts the item x immediately after (before) the item y in the
total order, Delete(x) removes the item x, the query Order(x, y) determines whether
x precedes y or y precedes x in the total order and Next(x)(Prev(x)) returns the
item that appears immediately after (before) x in the total order. These operations
are implemented by associating integer labels with the items in the list such that
the label associated with x is smaller than the label associated with y iff x precedes
y in the total order.

We should then make a separate component in Dendrite which consumes the roomserver output log and calculates depths for events, and then sends them on (or alternatively, keep the `syncapi` in charge of doing this).

NB: We still have the outstanding problem of what to do if the depths change and hence the linearisation of the DAG changes.

NBB: We still need to handle chunks of DAG (e.g joined the room, got some events, leave the room for a while, then rejoin). The above algorithm would need to be applied to each chunk independently, with book-keeping for a chunk along the lines of (from @erikjohnston ):

 1. Track "chunks" of events, i.e. connected sets of events in a room. (You might have multiple chunks if you leave and rejoin the room for example)
 2. Track which chunk is "live", i.e. what chunk is currently being updated (this is a bit hand wavey how you do this)
 3. Within a chunk use Katriel–Bodlaender to keep a topological ordering that can be efficiently updated when new events get added to a chunk
 4. Use that ordering when back paginating through a chunk
 5. Decided what to do when you get to the end of the current chunk, either stop or choose another chunk to start paginating through


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Graph linearisation #187

Problem context

Proposed solution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Graph linearisation #187

Description

Problem context

Proposed solution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions