-
Notifications
You must be signed in to change notification settings - Fork 397
MSC3554: Extensible Events - Translatable Text #3554
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
ec8cddd
632f9b9
8a8cd61
27a3492
5464232
c8c3143
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,85 @@ | ||
# MSC3554: Extensible Events - Translatable Messages | ||
|
||
[MSC1767](https://github.com/matrix-org/matrix-doc/pull/1767) describes Extensible Events in detail, | ||
though deliberately does not include schemas for some messaging types. This MSC covers translations | ||
on `m.text` content blocks specifically. | ||
|
||
*Rationale*: Splitting the MSCs down into individual parts makes it easier to implement and review in | ||
stages without blocking other pieces of the overall idea. For example, an issue with the way images | ||
are represented should not block the overall schema from going through. | ||
|
||
**Note**: As a second priority MSC in the Extensible Events series, this MSC is not proposed to be a | ||
blocker on extensible events entering the specification - this can mean that when extensible events | ||
are available, translations might not be (in stable form). Readers should consider the unstable prefix | ||
section for early support of this MSC. | ||
|
||
## Proposal | ||
|
||
As defined by MSC1767, `m.text` currently specifies an array with "representations" for the text | ||
on the event. Clients are expected to find the first representation they can render based on mimetype, | ||
which can be implicit. | ||
|
||
Sender-provided translations can be useful in contexts where the sender knows multiple languages are | ||
used in a room, such as announcements or other prepared communications. Less common in instant messaging | ||
(at least without a translation service on the sender's side), the receiving client can use the message | ||
which matches the user's preferred language. | ||
|
||
This MSC adds an additional key, `lang`, to the `m.text` representation schema and adjusts how a | ||
client decides upon a representation to include the language in that consideration. The array overall | ||
is still ordered, which means the sender should also supply the language which fits the scenario best | ||
as the first item. | ||
|
||
An example: | ||
|
||
```json5 | ||
{ | ||
"type": "m.message", | ||
"content": { | ||
"m.text": [ | ||
{ | ||
"body": "Je suis un poisson", | ||
"lang": "fr" | ||
}, | ||
{ | ||
"body": "I am a fish", | ||
"lang": "en" | ||
} | ||
] | ||
} | ||
} | ||
``` | ||
|
||
*Note*: `m.text`'s support for `mimetype` has been excluded from the example for brevity. It is still | ||
supported in events. | ||
|
||
By default, messages are assumed to be sent in English (`en`). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think it makes a lot of sense to assume a language in this case. There is no fault prove way to guess a language, so many clients will probably default to just sending whatever the user typed without a language. What is the benefit of assuming English, if that is probably wrong in a lot of cases? Shouldn't it rather just be unspecified? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The vast majority of software in the ecosystem makes assumptions about text being English. This is just to help implementations which might be searching for a language code, not to define the language itself. Unspecified leads to all kinds of issues with software, whereas French-as-default-English is generally fine.
anoadragon453 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
`lang` must be a valid language code under [BCP-47](https://www.rfc-editor.org/rfc/bcp/bcp47.txt). This is | ||
in line with the HTML specification which uses a similar attribute on the `<html>` node. | ||
|
||
There is no specific guidance for when to use translation support, though cases can include automatic machine | ||
translation, bots with internationalization support, and possibly some bridges. | ||
turt2live marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
## Potential issues | ||
|
||
The language code spec might not encompass all of the possible language code combinations, but should cover | ||
plenty given its popularity in HTML. | ||
|
||
If a sending client supports several languages, receiving clients could spend extra time attempting to find | ||
a suitable representation to render. This is considered a non-issue, though clients should consider how to | ||
efficiently search an array. | ||
|
||
## Alternatives | ||
|
||
No significant alternatives known. | ||
|
||
## Security considerations | ||
|
||
No specific considerations are required for this proposal. | ||
|
||
## Unstable prefix | ||
|
||
While this MSC is not considered stable, implementations should use `org.matrix.msc3554.lang` instead of `lang` | ||
when sending events. | ||
|
||
Note that extensible events should only be used in an appropriate room version as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to a comment thread, a question from @noaho:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we could add a new boolean field to an
m.text
object which indicates that a given translation was the one that the user originally typed/entered? Such aslang_source: true|false
?