-
Notifications
You must be signed in to change notification settings - Fork 399
MSC3401: Native Group VoIP Signalling #3401
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
05fd5af
7f5ee49
083fd9a
5ee96fb
b90b85e
ed37a0d
33a64f2
7fd1ba6
669d471
48526ad
dfd4ffe
3c306cc
4d43aae
856ddc7
d109b54
07f9547
7a06ed7
32f566a
3fde32b
05b5db2
43dc42f
5635cee
b8ebe27
6b98d66
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
@@ -0,0 +1,310 @@ | ||||||||||||||
# MSC3401: Native Group VoIP signalling | ||||||||||||||
|
||||||||||||||
Note: previously this MSC included SFU signalling which has now been moved to | ||||||||||||||
[MSC3898](https://github.com/matrix-org/matrix-spec-proposals/pull/3898) to | ||||||||||||||
avoid making this MSC too large. | ||||||||||||||
|
||||||||||||||
## Problem | ||||||||||||||
|
||||||||||||||
VoIP signalling in Matrix is currently conducted via timeline events in a 1:1 room. | ||||||||||||||
This has some limitations, especially if you try to broaden the approach to multiparty VoIP calls: | ||||||||||||||
|
||||||||||||||
* VoIP signalling can generate a lot of events as candidates are incrementally | ||||||||||||||
discovered, and for rapid call setup these need to be relayed as rapidly as | ||||||||||||||
possible. | ||||||||||||||
* Putting these into the room timeline means that if the client has a gappy | ||||||||||||||
sync, for VoIP to be reliable it will need to go back and fill in the gap | ||||||||||||||
before it can process any VoIP events, slowing things down badly. | ||||||||||||||
* Timeline events are (currently) subject to harsh rate limiting, as they are | ||||||||||||||
assumed to be a spam vector. | ||||||||||||||
* VoIP signalling leaks IP addresses. There is no reason to keep these around | ||||||||||||||
for posterity, and they should only be exposed to the devices which care about | ||||||||||||||
them. | ||||||||||||||
* Candidates are ephemeral data, and there is no reason to keep them around for | ||||||||||||||
posterity - they're just clogging up the DAG. | ||||||||||||||
|
||||||||||||||
Meanwhile we have no native signalling for group calls at all, forcing you to instead embed a separate system such as Jitsi, which has its own dependencies and doesn't directly leverage any of Matrix's encryption, decentralisation, access control or data model. | ||||||||||||||
|
||||||||||||||
## Proposal | ||||||||||||||
|
||||||||||||||
This proposal provides a signalling framework using to-device messages which can | ||||||||||||||
be applied to native Matrix 1:1 calls, full-mesh calls and in the future SFU | ||||||||||||||
calls, cascaded SFU calls MCU calls, and hybrid SFU/MCU approaches. It replaces | ||||||||||||||
the early flawed sketch at | ||||||||||||||
[MSC2359](https://github.com/matrix-org/matrix-doc/pull/2359). | ||||||||||||||
|
||||||||||||||
This does not immediately replace the current 1:1 call signalling, but may in future provide a migration path to unified signalling for 1:1 and group calls. | ||||||||||||||
|
||||||||||||||
Diagrammatically, this looks like: | ||||||||||||||
|
||||||||||||||
1:1: | ||||||||||||||
|
||||||||||||||
```diagram | ||||||||||||||
A -------- B | ||||||||||||||
``` | ||||||||||||||
|
||||||||||||||
Full mesh between clients | ||||||||||||||
|
||||||||||||||
```diagram | ||||||||||||||
A -------- B | ||||||||||||||
\ / | ||||||||||||||
\ / | ||||||||||||||
\ / | ||||||||||||||
\ / | ||||||||||||||
C | ||||||||||||||
``` | ||||||||||||||
|
||||||||||||||
SFU (aka Focus): | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Bikeshedding warning: I'm relatively new to the WebRTC/VoIP industry, but I have never heard the term focus used in place of SFU. Is this a commonly known term? Should we be using SFU in this spec instead? Including renaming There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the reason i originally went with However, in the current simpler draft, the only time you include this field is if you are using a conferencing focus of some kind. But, this proposal is not meant to just be for SFUs - the device you use to focus together your view of the conference could (in future) equally be an MCU as much as an SFU. Hence using the correct more generic term of 'focus' rather than making it specific to SFU technology. For instance, the server could advertise a stream which composites together a mosaic of different feeds for a non-E2EE call... at which point it's acting as a (hybrid) MCU. The term 'focus' comes from SIP (e.g. https://datatracker.ietf.org/doc/html/rfc3840#section-10.18) and is the standard term there for "an endpoint you connect to which mixes together other endpoints". I'm slightly inclined to keep it, to keep thing flexible for future more sophisticated foci tech. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we call it There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. focus is a pretty well-known word, and foci is its plural. i don't particularly want to call it 'focuses', given that's a different word (the 3rd person present form of 'to focus'). not sure this is a showstopper. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It definitely isn't a showstopper but I would like to come up with a better name if we can. It is also a bit of a red-flag that just about everything else in the MSC is calling it a SFU. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. While focus is a well-known word, outside of Britain its plural is 'focuses', so I would expect that a lot of people are going to be similarly confused over its meaning. Even the Cambridge Dictionary lists 'focuses' as the plural, while listing 'foci' as the formal plural in the UK. Might it be possible to at least mention in the spec that it's used in this sense? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm coming around to using "foci" as the word and there are references out there in the wild for "foci" being used in SIP terminology https://datatracker.ietf.org/doc/html/rfc4575#section-3.8 I think we should keep foci. |
||||||||||||||
|
||||||||||||||
```diagram | ||||||||||||||
A __ __ B | ||||||||||||||
\ / | ||||||||||||||
F | ||||||||||||||
| | ||||||||||||||
| | ||||||||||||||
C | ||||||||||||||
Where F is an SFU focus | ||||||||||||||
``` | ||||||||||||||
|
||||||||||||||
Cascaded decentralised SFU: | ||||||||||||||
|
||||||||||||||
```diagram | ||||||||||||||
A1 --. .-- B1 | ||||||||||||||
A2 ---Fa ----- Fb--- B2 | ||||||||||||||
\ / | ||||||||||||||
\ / | ||||||||||||||
\ / | ||||||||||||||
\ / | ||||||||||||||
Fc | ||||||||||||||
| | | ||||||||||||||
C1 C2 | ||||||||||||||
|
||||||||||||||
Where Fa, Fb and Fc are SFU foci, one per homeserver, each with two clients. | ||||||||||||||
``` | ||||||||||||||
|
||||||||||||||
### m.call state event | ||||||||||||||
|
||||||||||||||
The user who wants to initiate a call sends a `m.call` state event into the room to inform the room participants that a call is happening in the room. This effectively becomes the placeholder event in the timeline which clients would use to display the call in their scrollback (including duration and termination reason using `m.terminated`). Its body has the following fields: | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How should glare be handled at the group call level in the case where multiple parties actually didn't meant to set up separate group calls in a room but just meant to call each other? For example, we could dictate that calls that have the same purpose and name should be able to replace each other in case of glare? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a very good question. Any idea @ara4n? I think because the In any case, I think glare is a non issue for the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Glare can happen with any call type though if two clients decide to set There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The Perhaps there should be a way to specify a different power level requirement for different intents as well. A Discord user would expect to be able to start a room's call freely without disturbing other members of the room ala There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Imho outside of DMs (where both users have PL100 anyway usually) calls should not be allowed for normal users. It is still a vector of spam. Just imagine having calls being started in Matrix HQ. It would just cause issues imho. Imho it is a sane default to restrict this and need active changes to allow it in a room. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This is what I mean about the different call intents causing different levels of disruption. You're right, obviously Unless I'm misunderstanding the purpose of |
||||||||||||||
|
||||||||||||||
* `m.intent` to describe the intended UX for handling the call. One of: | ||||||||||||||
* `m.ring` if the call is meant to cause the room participants devices to ring | ||||||||||||||
(e.g. 1:1 call or group call) | ||||||||||||||
* `m.prompt` is the call should be presented as a conference call which users | ||||||||||||||
in the room are prompted to connect to | ||||||||||||||
* `m.room` if the call should be presented as a voice/video channel in which | ||||||||||||||
the user is immediately immersed on selecting the room. | ||||||||||||||
* `m.type` to say whether the initial type of call is voice only (`m.voice`) or | ||||||||||||||
video (`m.video`). This signals the intent of the user when placing the call | ||||||||||||||
to the participants (i.e. "i want to have a voice call with you" or "i want to | ||||||||||||||
have a video call with you") and warns the receiver whether they may be | ||||||||||||||
expected to view video or not, and provide suitable initial UX for displaying | ||||||||||||||
that type of call... even if it later gets upgraded to a video call. | ||||||||||||||
* `m.terminated` if this event indicates that the call in question has finished, | ||||||||||||||
including the reason why. (A voice/video room will never terminate.) (do we | ||||||||||||||
need a duration, or can we figure that out from the previous state event?). | ||||||||||||||
* `m.name` as an optional human-visible label for the call (e.g. "Conference | ||||||||||||||
call"). | ||||||||||||||
* The State key is a unique ID for that call. (We can't use the event ID, given | ||||||||||||||
`m.type` and `m.terminated` is mutable). If there are multiple non-terminated | ||||||||||||||
conf ID state events in the room, the client should display the most recently | ||||||||||||||
edited event. | ||||||||||||||
|
||||||||||||||
For instance: | ||||||||||||||
|
||||||||||||||
```jsonc | ||||||||||||||
{ | ||||||||||||||
"type": "m.call", | ||||||||||||||
"state_key": "cvsiu2893", | ||||||||||||||
"content": { | ||||||||||||||
"m.intent": "m.room", | ||||||||||||||
"m.type": "m.voice", | ||||||||||||||
"m.name": "Voice room" | ||||||||||||||
} | ||||||||||||||
} | ||||||||||||||
``` | ||||||||||||||
|
||||||||||||||
We mandate at most one call per room at any given point to avoid UX nightmares - if you want the user to participate in multiple parallel calls, you should simply create multiple rooms, each with one call. | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think that this is worth considering though, the UX nightmare might not be that bad (some clients might even work entirely with this possibility), and personally i think that putting the conf ID in a sub-field is just asking for problems (if the previous call information gets overridden by a person sending another state event for a "new" call while the last one is still in-progress.) Why not move conf_id into the state_key, currently declare multiple calls UB and unsupported, while noting that speccing it and properly seating it would be a case for a future MSC? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. have done. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Re-opening this one because we've just had a glare-like bug on Element Call where multiple people entered the call at the same time (as you do) and multiple conferences got created in the same room. In general, we're going to want some way to handle glare of several people hitting the 'start conference call' button at the same time. Allowing multiple calls in a room means we need to handle this somehow. It's not impossible (eg. we could define some common ID for 'the' call in a room allowing you to use other IDs for other calls?) but I'd just like to check that we really want to deal with this complexity. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am also very much in favour of having the With MSC3985 we now also have a separate method to create break-out rooms, so it feels like multiple calls in one room are no longer necessary I also think we should be able to use the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think there is still an issue with relying on With separate state keys, this is a lot easier, because it gives you a way to efficiently look up the current state of any call, current or historical. |
||||||||||||||
|
||||||||||||||
### Call participation | ||||||||||||||
|
||||||||||||||
Users who want to participate in the call declare this by publishing a `m.call.member` state event using their matrix ID as the state key (thus ensuring other users cannot edit it). The event contains an array `m.calls` of objects describing which calls the user is participating in within that room. This array must contain one item (for now). | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||||||||||||||
|
||||||||||||||
The fields within the item in the `m.calls` contents are: | ||||||||||||||
|
||||||||||||||
* `m.call_id` - the ID of the conference the user is claiming to participate in. | ||||||||||||||
If this doesn't match an unterminated `m.call` event, it should be ignored. | ||||||||||||||
* `m.devices` - The list of the member's active devices in the call. A member | ||||||||||||||
may join from one or more devices at a time, but they may not have two active | ||||||||||||||
sessions from the same device. Each device contains the following properties: | ||||||||||||||
* `device_id` - The device id to use for to-device messages when establishing | ||||||||||||||
a call | ||||||||||||||
* `session_id` - A unique identifier used for resolving duplicate sessions | ||||||||||||||
from a given device. When the `session_id` field changes from an incoming | ||||||||||||||
`m.call.member` event, any existing calls from this device in this call | ||||||||||||||
should be terminated. `session_id` should be generated once per client | ||||||||||||||
session on application load. | ||||||||||||||
* `expires_ts` - A POSIX timestamp in milliseconds describing when this device | ||||||||||||||
data should be considered stale. When updating their own device state, | ||||||||||||||
clients should choose a reasonable value for `expires_ts` in case they go | ||||||||||||||
offline unexpectedly. If the user stays connected for longer than this time, | ||||||||||||||
the client must actively update the state event with a new expiration | ||||||||||||||
timestamp. A device must be ignored if the `expires_ts` field indicates it | ||||||||||||||
has expired, or if the user's `m.room.member` event's membership field is | ||||||||||||||
not `join`. | ||||||||||||||
* `feeds` - Contains an array of feeds the member is sharing and the opponent | ||||||||||||||
member may reference when setting up their WebRTC connection. | ||||||||||||||
* `purpose` - Either `m.usermedia` or `m.screenshare` otherwise the feed | ||||||||||||||
should be ignored. | ||||||||||||||
|
||||||||||||||
For instance: | ||||||||||||||
|
||||||||||||||
```jsonc | ||||||||||||||
{ | ||||||||||||||
"type": "m.call.member", | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. todo: actually track here whether the participant is joined to the call or not(!) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah we still have an issue with tracking participants for a given group call for displaying in the UI. How are we going to check who is in a call and scale it? |
||||||||||||||
"state_key": "@matthew:matrix.org", | ||||||||||||||
"content": { | ||||||||||||||
"m.calls": [ | ||||||||||||||
ara4n marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||
{ | ||||||||||||||
"m.call_id": "cvsiu2893", | ||||||||||||||
"m.devices": [ | ||||||||||||||
{ | ||||||||||||||
"device_id": "ASDUHDGFYUW", // Used to target to-device messages | ||||||||||||||
"session_id": "GHKJFKLJLJ", // Used to resolve duplicate calls from a device | ||||||||||||||
"expires_ts": 1654616071686, | ||||||||||||||
"feeds": [ | ||||||||||||||
{ | ||||||||||||||
"purpose": "m.usermedia", | ||||||||||||||
"id": "qegwy64121wqw", // WebRTC MediaStream id | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These IDs are only accurate if announcing to an SFU - in full mesh, each separate call will have its own IDs. Therefore we should probably scope these to a given 1:1 call ID? |
||||||||||||||
"tracks": [ | ||||||||||||||
{ | ||||||||||||||
"kind": "audio", | ||||||||||||||
"id": "zvhjiwqsx", // WebRTC MediaStreamTrack id | ||||||||||||||
"label": "Sennheiser Mic", | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually, perhaps this is a bit of a privacy violation - why do other people in a conference need to know what my devices are called? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, right. Should we remove it? (I also don't seem to find a case where we would want to let others know what are devices are called when publishing 🤔) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, I can't really see a good reason we'd send this. The |
||||||||||||||
"settings": { // WebRTC MediaTrackSettings object | ||||||||||||||
"channelCount": 2, | ||||||||||||||
"sampleRate": 48000, | ||||||||||||||
"m.maxbr": 32000, // Matrix-specific extension to advertise the max bitrate of this track | ||||||||||||||
} | ||||||||||||||
}, | ||||||||||||||
{ | ||||||||||||||
"kind": "video", | ||||||||||||||
"id": "zbhsbdhzs", | ||||||||||||||
"label": "Logitech Webcam", | ||||||||||||||
"settings": { | ||||||||||||||
"width": 1280, | ||||||||||||||
"height": 720, | ||||||||||||||
"facingMode": "user", | ||||||||||||||
"frameRate": 30.0, | ||||||||||||||
"m.maxbr": 512000, | ||||||||||||||
Comment on lines
+193
to
+198
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are these I'm just wondering if they are useful in a general case (like why would we need the information about a The only case where I assume the information about camera mode etc might be useful is when there is a specific app that runs over Matrix and needs to advertise the properties of the video/audio streams in order to implement a specific logic. But in this case, we're talking about application-specific data, i.e. something that must be the logic of the app rather than part of a [generic] Matrix protocol. I think generally we only need the stream and track IDs, a purpose (for the use case of conference / using WebRTC for calls), and, perhaps basic information about certain tracks like the width and height of the video (theoretically it's not required, because we'll be able to access it when the track is received, but practically we would need it for the simulcast implementation on the SFU side, so such information would be useful for the conference use cases). |
||||||||||||||
} | ||||||||||||||
}, | ||||||||||||||
], | ||||||||||||||
}, | ||||||||||||||
{ | ||||||||||||||
"purpose": "m.screenshare", | ||||||||||||||
"id": "suigv372y8378", | ||||||||||||||
"tracks": [ | ||||||||||||||
{ | ||||||||||||||
"kind": "video", | ||||||||||||||
"id": "xbhsbdhzs", | ||||||||||||||
"label": "My Screenshare", | ||||||||||||||
"settings": { | ||||||||||||||
"width": 3072, | ||||||||||||||
"height": 1920, | ||||||||||||||
"cursor": "moving", | ||||||||||||||
"displaySurface": "monitor", | ||||||||||||||
"frameRate": 30.0, | ||||||||||||||
"m.maxbr": 768000, | ||||||||||||||
} | ||||||||||||||
}, | ||||||||||||||
] | ||||||||||||||
} | ||||||||||||||
] | ||||||||||||||
} | ||||||||||||||
] | ||||||||||||||
} | ||||||||||||||
], | ||||||||||||||
} | ||||||||||||||
} | ||||||||||||||
``` | ||||||||||||||
|
||||||||||||||
This builds on [MSC3077](https://github.com/matrix-org/matrix-spec-proposals/pull/3077), which describes streams in `m.call.*` events via a `sdp_stream_metadata` field. | ||||||||||||||
|
||||||||||||||
**TODO: Do we need all of this data? Why would we need it?** **TODO: This | ||||||||||||||
doesn't follow the MSC3077 format very well - can we do something about that?** | ||||||||||||||
**TODO: Add tracks field** **TODO: Add bitrate/format fields** | ||||||||||||||
|
||||||||||||||
Clients should do their best to ensure that calls in `m.call.member` state are removed when the member leaves the call. However, there will be cases where the device loses network connectivity, power, the application is forced closed, or it crashes. If the `m.call.member` state has stale device data the call setup will fail. Clients should re-attempt invites up to 3 times before giving up on calling a member. | ||||||||||||||
|
||||||||||||||
### Call setup | ||||||||||||||
|
||||||||||||||
In a full mesh call, for any two participants, the one with the lexicographically lower user ID is responsible for calling the other. If two participants share the same user ID (that is, if a user has joined the call from multiple devices), then the one with the lexicographically lower device ID is responsible for calling the other. | ||||||||||||||
|
||||||||||||||
Call setup then uses the normal `m.call.*` events, except they are sent over to-device messages to the relevant devices (encrypted via Olm). This means: | ||||||||||||||
|
||||||||||||||
* When initiating a 1:1 call, the `m.call.invite` is sent to the devices listed in `m.call.member` event's `m.devices` array using the `device_id` field. | ||||||||||||||
* `m.call.*` events sent via to-device messages should also include the following properties in their content: | ||||||||||||||
* `conf_id` - The group call id listed in `m.call` | ||||||||||||||
* `dest_session_id` - The recipient's session id. Incoming messages with a | ||||||||||||||
`dest_session_id` that doesn't match your current session id should be | ||||||||||||||
discarded. | ||||||||||||||
* `seq` - The sequence number of the to-device message. This is done since the | ||||||||||||||
order of to-device messages is not guaranteed. With each new to-device | ||||||||||||||
message this number gets incremented by `1` and it starts at `0` | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we specify here whether this counter is scoped to the
Suggested change
|
||||||||||||||
* In addition to the fields above `m.call.invite` events sent via to-device messages should include the following properties : | ||||||||||||||
* `device_id` - The message sender's device id. Used by the opponent member to send response to-device signalling messages even if the `m.call.member` event has not been received yet. | ||||||||||||||
* `sender_session_id` - Like the `device_id` the `sender_session_id` is used | ||||||||||||||
by the opponent member to filter out messages unrelated to the sender's | ||||||||||||||
session even if the `m.call.member` event has not been received yet. | ||||||||||||||
* For 1:1 calls, we might want to let the to-device messages flow and cause the client to ring even before the `m.call` event propagates, to minimise latency. Therefore we'll need to include an `m.intent` on the `m.call.invite` too. | ||||||||||||||
|
||||||||||||||
## Example Diagrams | ||||||||||||||
|
||||||||||||||
### Legend | ||||||||||||||
|
||||||||||||||
| Arrow Style | Description | | ||||||||||||||
|-------------|-------------| | ||||||||||||||
| Solid | [State Event](https://spec.matrix.org/latest/client-server-api/#types-of-room-events) | | ||||||||||||||
| Dashed | [Event (sent as to-device message)](https://spec.matrix.org/latest/client-server-api/#send-to-device-messaging) | | ||||||||||||||
|
||||||||||||||
### Basic Call | ||||||||||||||
|
||||||||||||||
```mermaid | ||||||||||||||
sequenceDiagram | ||||||||||||||
autonumber | ||||||||||||||
participant Alice | ||||||||||||||
participant Room | ||||||||||||||
participant Bob | ||||||||||||||
Alice->>Room: m.call | ||||||||||||||
Alice->>Room: m.call.member | ||||||||||||||
Bob->>Room: m.call.member | ||||||||||||||
Alice-->>Bob: m.call.invite | ||||||||||||||
Alice-->>Bob: m.call.candidates | ||||||||||||||
Alice-->>Bob: m.call.candidates | ||||||||||||||
Bob-->>Alice: m.call.answer | ||||||||||||||
Bob-->>Alice: m.call.candidates | ||||||||||||||
Alice-->>Bob: m.call.select_answer | ||||||||||||||
bwindels marked this conversation as resolved.
Show resolved
Hide resolved
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wonder if the AIUI, the purpose of Perhaps to stop ringing, the other devices should just monitor the |
||||||||||||||
``` | ||||||||||||||
|
||||||||||||||
## Potential issues | ||||||||||||||
|
||||||||||||||
To-device messages are point-to-point between servers, whereas today's `m.call.*` messages can transitively traverse servers via the room DAG, thus working around federation problems. In practice if you are relying on that behaviour, you're already in a bad place. | ||||||||||||||
|
||||||||||||||
## Alternatives | ||||||||||||||
|
||||||||||||||
There are many many different ways to do this. The main other alternative considered was not to use state events to track membership, but instead gossip it via either to-device or DC messages between participants. This fell apart however due to trust: you effectively end up reinventing large parts of Matrix layered on top of to-device or DC. So you might as well publish and distribute the participation data in the DAG rather than reinvent the wheel. | ||||||||||||||
|
||||||||||||||
An alternative to to-device messages is to use DMs. You still risk gappy sync problems though due to lots of traffic, as well as the hassle of creating DMs and requiring canonical DMs to set up the calls. It does make debugging easier though, rather than having to track encrypted ephemeral to-device msgs. | ||||||||||||||
|
||||||||||||||
## Security considerations | ||||||||||||||
|
||||||||||||||
State events are not encrypted currently, and so this leaks that a call is happening, and who is participating in it, and from which devices. | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. plus that it happened in the past and who there-and-then participated in it, by correlating it corresponding |
||||||||||||||
|
||||||||||||||
Malicious users in a room could try to sabotage a conference by overwriting the `m.call` state event of the current ongoing call. | ||||||||||||||
|
||||||||||||||
## Unstable prefix | ||||||||||||||
|
||||||||||||||
| stable event type | unstable event type | | ||||||||||||||
|-------------------|---------------------| | ||||||||||||||
| m.call | org.matrix.msc3401.call | | ||||||||||||||
| m.call.member | org.matrix.msc3401.call.member | |
Uh oh!
There was an error while loading. Please reload this page.