|
| 1 | +# Body field as media caption |
| 2 | + |
| 3 | +When sending images or other attachments, users often want to include text to |
| 4 | +convey additional information. Most chat platforms offer media captions as a |
| 5 | +first-class feature, allowing users to choose the attachment and write text, |
| 6 | +then send both together in one message. |
| 7 | + |
| 8 | +Matrix currently does not enable this on the protocol level: at best, clients |
| 9 | +can emulate the behavior by sending two messages quickly; at worst, the user |
| 10 | +has to do that manually. Sending separate messages means it's possible for |
| 11 | +the second message to be delayed or lost if something goes wrong. |
| 12 | + |
| 13 | +## Proposal |
| 14 | + |
| 15 | +This proposal allows the `filename` field from [`m.file`], and the `format` and |
| 16 | +`formatted_body` fields from [`m.text`] for all media msgtypes (`m.image`, |
| 17 | +`m.audio`, `m.video`, `m.file`). This proposal does not affect the `m.location` |
| 18 | +msgtype, nor the separate `m.sticker` event type: stickers already use `body` |
| 19 | +as a description, and locations don't have file names. |
| 20 | + |
| 21 | +If the `filename` field is present in a media message, clients should treat |
| 22 | +`body` as a caption instead of a file name. If the `format`/`formatted_body` |
| 23 | +fields are present in addition to `filename` and `body`, then they should take |
| 24 | +priority as the caption text. Formatted text in media captions is rendered the |
| 25 | +same way as formatted text in `m.text` messages. |
| 26 | + |
| 27 | +The current spec is somewhat ambiguous as to how `body` should be handled and |
| 28 | +the definition varies across different message types. The current spec for |
| 29 | +[`m.image`] describes `body` as |
| 30 | + |
| 31 | +> A textual representation of the image. This could be the alt text of the |
| 32 | +> image, the filename of the image, or some kind of content description for |
| 33 | +> accessibility e.g. ‘image attachment’. |
| 34 | +
|
| 35 | +while [`m.audio`] describes it as |
| 36 | + |
| 37 | +> A description of the audio e.g. ‘Bee Gees - Stayin’ Alive’, or some kind of |
| 38 | +> content description for accessibility e.g. ‘audio attachment’. |
| 39 | +
|
| 40 | +In practice, clients (or at least Element) use it as the file name. As a part |
| 41 | +of adding captions, the `body` field for all media message types is explicitly |
| 42 | +defined to be used as the file name when the `filename` field is not present. |
| 43 | + |
| 44 | +For `m.file` messages, the [current (v1.9) spec][`m.file`] confusingly defines |
| 45 | +`filename` as "The original filename of the uploaded file" and simultaneously |
| 46 | +recommends that `body` is "the filename of the original upload", effectively |
| 47 | +saying both fields should have the file name. In order to avoid (old) messages |
| 48 | +with both fields being misinterpreted as having captions, the `body` field |
| 49 | +should not be used as a caption when it's equal to `filename`. |
| 50 | + |
| 51 | +[`m.file`]: https://spec.matrix.org/v1.9/client-server-api/#mfile |
| 52 | +[`m.text`]: https://spec.matrix.org/v1.9/client-server-api/#mtext |
| 53 | +[`m.image`]: https://spec.matrix.org/v1.9/client-server-api/#mimage |
| 54 | +[`m.audio`]: https://spec.matrix.org/v1.9/client-server-api/#maudio |
| 55 | + |
| 56 | +### Examples |
| 57 | +<details> |
| 58 | +<summary>Image with caption</summary> |
| 59 | + |
| 60 | +```json |
| 61 | +{ |
| 62 | + "msgtype": "m.image", |
| 63 | + "url": "mxc://maunium.net/HaIrXlnKfEEHvMNKzuExiYlv", |
| 64 | + "filename": "cat.jpeg", |
| 65 | + "body": "this is a cat picture :3", |
| 66 | + "info": { |
| 67 | + "w": 479, |
| 68 | + "h": 640, |
| 69 | + "mimetype": "image/jpeg", |
| 70 | + "size": 27253 |
| 71 | + }, |
| 72 | + "m.mentions": {} |
| 73 | +} |
| 74 | +``` |
| 75 | + |
| 76 | +</details> |
| 77 | +<details> |
| 78 | +<summary>File with formatted caption</summary> |
| 79 | + |
| 80 | +```json |
| 81 | +{ |
| 82 | + "msgtype": "m.file", |
| 83 | + "url": "mxc://maunium.net/TizWsLhHfDCETKRXdDwHoAGn", |
| 84 | + "filename": "hello.txt", |
| 85 | + "body": "this caption is longer than the file itself 🤔", |
| 86 | + "format": "org.matrix.custom.html", |
| 87 | + "formatted_body": "this <strong>caption</strong> is longer than the file itself 🤔", |
| 88 | + "info": { |
| 89 | + "mimetype": "text/plain", |
| 90 | + "size": 14 |
| 91 | + }, |
| 92 | + "m.mentions": {} |
| 93 | +} |
| 94 | +``` |
| 95 | + |
| 96 | +</details> |
| 97 | + |
| 98 | +### Summary |
| 99 | +* `filename` is defined for all media msgtypes. |
| 100 | +* `body` is defined to be a caption when `filename` is present and not equal to `body`. |
| 101 | + * `format` and `formatted_body` are allowed as well for formatted captions. |
| 102 | +* `body` is defined to be the file name when `filename` is not present. |
| 103 | + |
| 104 | +## Potential issues |
| 105 | + |
| 106 | +In clients that don't show the file name anywhere, the caption would not be |
| 107 | +visible at all. However, extensible events would run into the same issue. |
| 108 | +Clients having captions implemented beforehand may even help eventually |
| 109 | +implementing extensible events. |
| 110 | + |
| 111 | +Old clients may default to using the caption as the file name when the user |
| 112 | +wants to download a file, which will be somewhat weird UX. |
| 113 | + |
| 114 | +## Alternatives |
| 115 | + |
| 116 | +### [MSC2529](https://github.com/matrix-org/matrix-spec-proposals/pull/2529) |
| 117 | + |
| 118 | +MSC2529 would allow existing clients to render captions without any changes, |
| 119 | +but the use of relations makes implementation more difficult, especially for |
| 120 | +bridges. It would require either waiting a predefined amount of time for the |
| 121 | +caption to come through, or editing the message on the target platform (if |
| 122 | +edits are supported). |
| 123 | + |
| 124 | +The format proposed by MSC2529 would also make it technically possible to use |
| 125 | +other message types as captions without changing the format of the events, |
| 126 | +which is not possible with this proposal. |
| 127 | + |
| 128 | +### Extensible events |
| 129 | + |
| 130 | +Like MSC2529, this would be obsoleted by [extensible events](https://github.com/matrix-org/matrix-spec-proposals/pull/3552). |
| 131 | +However, fully switching to extensible events requires significantly more |
| 132 | +implementation work, and it may take years for the necessary time to be |
| 133 | +allocated for that. |
| 134 | + |
| 135 | +## Security considerations |
| 136 | + |
| 137 | +This proposal doesn't involve any security-sensitive components. |
| 138 | + |
| 139 | +## Unstable prefix |
| 140 | + |
| 141 | +The fields being added already exist in other msgtypes, so unstable prefixes |
| 142 | +don't seem necessary. Additionally, using `body` as a caption could already be |
| 143 | +considered spec-compliant due to the ambiguous definition of the field, and |
| 144 | +only adding unstable prefixes for the other fields would be silly. |
0 commit comments