|
| 1 | +# MSC3970: Scope transaction IDs to devices |
| 2 | + |
| 3 | +Transaction identifiers in the Client-Server API are currently scoped to the |
| 4 | +concept of a "client session" which, when refresh tokens are used, can span a |
| 5 | +sequence of access tokens. |
| 6 | + |
| 7 | +The spec [reads](https://spec.matrix.org/v1.6/client-server-api/#transaction-identifiers): |
| 8 | + |
| 9 | +> The scope of a transaction ID is a “client session”, where that session is |
| 10 | +> identified by a particular access token. When refreshing an access token, the |
| 11 | +> transaction ID’s scope is retained. This means that if a client with token `A` |
| 12 | +> uses `TXN1` as their transaction ID, refreshes the token to `B`, and uses |
| 13 | +> `TXN1` again it’ll be assumed to be a duplicate request and ignored. If the |
| 14 | +> client logs out and back in between the `A` and `B` tokens, `TXN1` could be used |
| 15 | +> once for each. |
| 16 | +
|
| 17 | +The "client session" scope concept described can be complicated to implement. |
| 18 | + |
| 19 | +This MSC proposes that the scope of a transaction identifier is changed to something |
| 20 | +that is easier to implement whilst maintaining required transaction semantics. |
| 21 | + |
| 22 | +The transaction IDs appear in two parts of the Client-Server API spec: |
| 23 | + |
| 24 | +1. As a identifier to allow the homeserver to make some `PUT` endpoints |
| 25 | +[idempotent](https://spec.matrix.org/v1.6/client-server-api/#transaction-identifiers) |
| 26 | +2. An unsigned field in the event data model for a client to tell if it sent an |
| 27 | +event or not. a.k.a. solving the |
| 28 | +["local echo"](https://spec.matrix.org/v1.6/client-server-api/#local-echo) problem |
| 29 | + |
| 30 | +For reference, the `PUT` endpoints that have the a `{txnId}` param are: |
| 31 | + |
| 32 | +- [`PUT /_matrix/client/v3/rooms/{roomId}/send/{eventType}/{txnId}`](https://spec.matrix.org/v1.6/client-server-api/#put_matrixclientv3roomsroomidsendeventtypetxnid) |
| 33 | +- [`PUT /_matrix/client/v3/rooms/{roomId}/redact/{eventId}/{txnId}`](https://spec.matrix.org/v1.6/client-server-api/#put_matrixclientv3roomsroomidredacteventidtxnid) |
| 34 | +- [`PUT /_matrix/client/v3/sendToDevice/{eventType}/{txnId}`](https://spec.matrix.org/v1.6/client-server-api/#put_matrixclientv3sendtodeviceeventtypetxnid) |
| 35 | + |
| 36 | +## Proposal |
| 37 | + |
| 38 | +It is proposed that the scope of transaction identifiers be changed from a |
| 39 | +"client session" to a "device". |
| 40 | + |
| 41 | +A "device" is typically represented by a `device_id` elsewhere in the spec. |
| 42 | + |
| 43 | +For idempotency, this means the homeserver changing the method of identifying a |
| 44 | +request from: |
| 45 | + |
| 46 | +- (`client session`, `HTTP path of request which includes the transaction ID`) |
| 47 | + |
| 48 | +to: |
| 49 | + |
| 50 | +- (`device_id`, `HTTP path of request which includes the transaction ID`) |
| 51 | + |
| 52 | +For local echo, the homeserver would now include the `transaction_id` in the |
| 53 | +event data when it is serving a sync request from the same `device_id` as |
| 54 | +determined from the access token. |
| 55 | + |
| 56 | +## Potential issues |
| 57 | + |
| 58 | +### This is technically a breaking change to the spec |
| 59 | + |
| 60 | +The main "issue" I see with this proposal is that this is technically a breaking |
| 61 | +change to the spec. |
| 62 | + |
| 63 | +A device ID could have multiple sequences of access tokens associated |
| 64 | +with it (since device ID can be specified as a parameter to |
| 65 | +[`/login`](https://spec.matrix.org/v1.6/client-server-api/#post_matrixclientv3login)). |
| 66 | + |
| 67 | +For example, a "bot" implementation might masquerade as the same "device" despite |
| 68 | +calling `/login` on every run by passing the same device ID. Such a device might |
| 69 | +also use the same transaction ID on each run. |
| 70 | + |
| 71 | +Therefore it could potentially lead to a request being treated as a duplicate |
| 72 | +where previously it would have been treated as a distinct request. |
| 73 | + |
| 74 | +However, some evidence has been collated which suggests that nothing would be impacted |
| 75 | +in reality: |
| 76 | + |
| 77 | +#### 1. Data from the matrix.org homeserver suggests the change would have no impact |
| 78 | + |
| 79 | +The `matrix.org` homeserver is a reasonable size deployment and could be considered |
| 80 | +reasonably representative of the diversity of Matrix clients. |
| 81 | + |
| 82 | +Synapse maintains a `event_txn_id` table |
| 83 | +that contains a rolling 24 hour window of |
| 84 | +(`user_id`, `token_id`, `room_id`, `txn_id`) tuples. |
| 85 | + |
| 86 | +Having analysed the contents of the table, it appears that there are no repeated |
| 87 | +transaction IDs for a given user, token and room. |
| 88 | + |
| 89 | +n.b. not all `PUT` endpoints contribute to the table but I think the high-volume |
| 90 | +ones do |
| 91 | + |
| 92 | +This suggests that the widening of the scope from token to device would not have caused |
| 93 | +any issues during the periods sampled. |
| 94 | + |
| 95 | +For reference the following is the schema of the `event_txn_id` table: |
| 96 | + |
| 97 | +```sql |
| 98 | + Table "matrix.event_txn_id" |
| 99 | + Column | Type | Collation | Nullable | Default |
| 100 | +-------------+--------+-----------+----------+--------- |
| 101 | + event_id | text | | not null | |
| 102 | + room_id | text | | not null | |
| 103 | + user_id | text | | not null | |
| 104 | + token_id | bigint | | not null | |
| 105 | + txn_id | text | | not null | |
| 106 | + inserted_ts | bigint | | not null | |
| 107 | +Indexes: |
| 108 | + "event_txn_id_token_id" btree (token_id) |
| 109 | + "event_txn_id_event_id" UNIQUE, btree (event_id) |
| 110 | + "event_txn_id_ts" btree (inserted_ts) |
| 111 | + "event_txn_id_txn_id" UNIQUE, btree (room_id, user_id, token_id, txn_id) |
| 112 | +Foreign-key constraints: |
| 113 | + "event_txn_id_event_id_fkey" FOREIGN KEY (event_id) REFERENCES matrix.events(event_id) ON DELETE CASCADE |
| 114 | + "event_txn_id_token_id_fkey" FOREIGN KEY (token_id) REFERENCES matrix.access_tokens(id) ON DELETE CASCADE |
| 115 | +``` |
| 116 | + |
| 117 | +And the query to look for repeated transaction IDs: |
| 118 | + |
| 119 | +```sql |
| 120 | +SELECT e1.txn_id, LEFT(e1.user_id, 5) AS user_id, e1.token_id, e2.token_id, e1.inserted_ts, e2.inserted_ts FROM matrix.event_txn_id e1, matrix.event_txn_id e2 WHERE e1.txn_id = e2.txn_id AND e1.event_id <> e2.event_id AND e1.event_id < e2.event_id AND e1.user_id = e2.user_id AND e1.room_id = e2.room_id ORDER BY e1.token_id; |
| 121 | + txn_id | user_id | token_id | token_id | inserted_ts | inserted_ts |
| 122 | +--------+---------+----------+----------+-------------+------------- |
| 123 | +(0 rows) |
| 124 | +``` |
| 125 | + |
| 126 | +#### 2. Conduit homeserver already scopes transaction IDs to devices |
| 127 | + |
| 128 | +As highlighted by the new Complement |
| 129 | +[tests](https://github.com/matrix-org/complement/pull/613) the Conduit homeserver |
| 130 | +is already scoping transaction IDs to devices. |
| 131 | + |
| 132 | +I can't find a related issue [listed](https://gitlab.com/famedly/conduit/-/issues), |
| 133 | +so presumably this non-compliant behaviour isn't causing a known issue for |
| 134 | +admins and users of the Conduit homeserver? |
| 135 | + |
| 136 | +### Is the "device" concept the right level of abstraction to use? |
| 137 | + |
| 138 | +One way to look at it is that device is already widely used in the end-to-end |
| 139 | +encryption parts of the spec and so why isn't it suitable for this use case too? |
| 140 | + |
| 141 | +### What about two clients masquerading as a single device ID? |
| 142 | + |
| 143 | +I don't know if this actually works in practice. If this was a concern then it |
| 144 | +could be mitigated by clarifying in the spec that if a client wishes to submit |
| 145 | +requests using the same `device_id` as another client session that it should |
| 146 | +choose transaction identifiers that are unique to that client session. |
| 147 | + |
| 148 | +## Alternatives |
| 149 | + |
| 150 | +### Do nothing |
| 151 | + |
| 152 | +We could leave the transaction ID scope as is. |
| 153 | + |
| 154 | +However, it makes it difficult to implement a features like |
| 155 | +[MSC3861: Matrix architecture change to delegate authentication via OIDC](https://github.com/matrix-org/matrix-spec-proposals/pull/3861) |
| 156 | +as the concept of a "client session" doesn't really exist in OIDC. |
| 157 | + |
| 158 | +As noted above, at least one homeserver implementation is also not implementing |
| 159 | +the spec as it is today. |
| 160 | + |
| 161 | +It also turns out that the current implementation of refresh tokens in Synapse |
| 162 | +breaks the transaction ID semantics already and needs to be |
| 163 | +[fixed](https://github.com/matrix-org/synapse/issues/15141). |
| 164 | + |
| 165 | +### Make a backwards compatible change |
| 166 | + |
| 167 | +A backwards compatible alternative could be something like: |
| 168 | + |
| 169 | +1. For idempotency have clients opt-in to a new scope of transaction ID, but |
| 170 | +support the current semantics too for compatibility |
| 171 | +2. Have clients opt-in (e.g. request param on the sync endpoint) to receiving |
| 172 | +transaction ID for all events in the sync response and make the client |
| 173 | +responsible for identifying which messages they sent |
| 174 | + |
| 175 | +The disadvantage of this is that we create a load of extra maintenance work to |
| 176 | +support both semantics for a period of time for (empirically) no gain in return. |
| 177 | + |
| 178 | +## Security considerations |
| 179 | + |
| 180 | +A malicious client can adopt an existing device ID of a user. This could |
| 181 | +possibly allow some kind of denial of service attack. |
| 182 | + |
| 183 | +However, if such an attack where possible it would be possible to do so without |
| 184 | +this MSC as device IDs are crucial to the implementation of end-to-end encryption. |
| 185 | + |
| 186 | +## Other recommendations |
| 187 | + |
| 188 | +I'm not suggesting that these recommendations are addressed in this proposal, but |
| 189 | +more notes for future proposals or spec clarifications. |
| 190 | + |
| 191 | +### Clarification on idempotency semantics |
| 192 | + |
| 193 | +I have separately prepared a [spec PR](https://github.com/matrix-org/matrix-spec/pull/1449) |
| 194 | +to clarify some of the idempotency semantics that doesn't modify the spec but is |
| 195 | +useful to understand the context of this proposal. |
| 196 | + |
| 197 | +### Clarification on transaction ID time scope |
| 198 | + |
| 199 | +I also suggest that the spec be clarified over what time periods the transaction |
| 200 | +ID is scoped for such that clients can be aware. This cloud simply be to say |
| 201 | +that the time period is not defined and so may vary by implementation. |
| 202 | + |
| 203 | +### Recommend a less naive transaction ID format |
| 204 | + |
| 205 | +Currently the format of a transaction ID is not specified, but a recommendation |
| 206 | +is [given](https://spec.matrix.org/v1.6/client-server-api/#transaction-identifiers): |
| 207 | + |
| 208 | +> After the [previous] request has finished, the {txnId} value should be changed |
| 209 | +> (how is not specified; **a monotonically increasing integer is recommended**). |
| 210 | +
|
| 211 | +I think this is an unnecessarily naive recommendation. |
| 212 | + |
| 213 | +In most clients environments a pseudo-random number generator will be available and so |
| 214 | +could be used to generate a UUID/ULID or similar. |
| 215 | + |
| 216 | +As an aside, in my research I have found some clients use a "seconds since epoch" as a |
| 217 | +transaction ID which introduces a limit on the maximum possible event transmission rate |
| 218 | +per room to once per second. Perhaps a better recommendation could help prevent the |
| 219 | +such behaviour being introduced in future. |
| 220 | + |
| 221 | +## Unstable prefix |
| 222 | + |
| 223 | +None needed. |
| 224 | + |
| 225 | +## Dependencies |
| 226 | + |
| 227 | +None. |
0 commit comments