-
Notifications
You must be signed in to change notification settings - Fork 397
MSC4139: Bot buttons & conversations #4139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,201 @@ | ||
# MSC4139: Bot buttons & conversations | ||
|
||
Nearly all bots and bridges in the Matrix ecosystem use a text-based interface to support their | ||
operations. These interfaces are typically highly structured commands and require the user to know | ||
the entire incantation for the action they want to invoke, making them feel like "power user" | ||
features. | ||
|
||
Further, interacting with bots today is extremely transactional: the user sends a command and the | ||
bot performs the action as-is or spews errors back at the user due to a typo. If an error was | ||
returned, the entire command needs to be re-run. | ||
|
||
A more user-friendly approach is to have the user provide the bot with information as needed, | ||
without having to guess at the bot's current state. This proposal calls such an approach a | ||
"conversation" with the bot - the user does something to "start" the conversation, and the bot | ||
provides a limited set of prompts to continue the conversation. This repeats until the conversation | ||
ends (usually by the bot saying so explicitly). Users may hold multiple concurrent conversations | ||
with bots. Conversation starters are deliberately left as a bot implementation detail in this | ||
proposal to allow the ecosystem to explore this new interaction technique. Examples may include the | ||
user opening a DM with the bot, sending a `!command` message, or, in future, sending a slash command | ||
like `/start`. | ||
|
||
This conversation approach is heavily inspired by platforms like Telegram. | ||
|
||
## Proposal | ||
|
||
A new `m.prompts` [mixin](https://github.com/matrix-org/matrix-spec-proposals/blob/main/proposals/1767-extensible-events.md#mixins-specifically-allowed) | ||
is specified which describes actions another user in the room can take to further the conversation. | ||
|
||
The `m.prompts` mixin contains some scoping parameters, rendering hints, and the actual prompts | ||
themselves. For example, when applied to an `m.message` event, the `m.prompts` may look like the | ||
following: | ||
|
||
*Note*: The JSON comments are normative, and irrelevant fields are not shown. | ||
|
||
```jsonc | ||
{ | ||
"type": "m.message", | ||
"sender": "@bot:example.org", | ||
"content": { | ||
"m.text": [ | ||
{"body": "Hello! Say <code>!roll [dice]</code> to roll some dice.", "mimetype": "text/html"}, | ||
{"body": "Hello! Say `!roll [dice]` to get started."} | ||
], | ||
"m.prompts": { | ||
// Clients which recognize `m.prompts` would use `intro` to render the event instead. This | ||
// allows the remainder of the event to be a fallback for unsupported clients. | ||
"intro": { | ||
"type": "m.message", | ||
"content": { | ||
"m.text": [ | ||
{"body": "Hello! What would you like to roll today?"} | ||
] | ||
} | ||
}, | ||
// These are the users who should see the `prompts`. Other users may see something like "you | ||
// do not have permission to reply to this message" instead of prompts. `scope` is optional: | ||
// when not supplied, all users who can see the message can respond. When an empty array, no | ||
// one can respond. Clients SHOULD NOT show prompts to users who are descoped. | ||
"scope": [ | ||
"@alice:example.org", | ||
"@bob:example.org", | ||
], | ||
Comment on lines
+55
to
+62
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. not needed but it would be really nice to generalise this with a whisper MSC. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The intent here is to very much ensure interactions are visible to the room. Bots looking to whisper the user can utilize DMs for now. |
||
// These are the options a user has. Note the 2 distinct types and 3 label approaches. | ||
"prompts": [ | ||
{ | ||
// `type` is the prompt type: "preset" (show a button) or "input" (shown below) | ||
"type": "preset", | ||
// `id` is used by the bot to figure out what prompt the user picked. It is an opaque ID. | ||
"id": "1d6", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This seems to indicate that interactions are stateful? I'd been thinking about a similar-ish system, but making the client just send a eg. a bot would define something like // other content stuffs ...
"interactions": [
{
"type": "command",
"metadata": {
"action": "roll",
"dice_type": "1d6"
}
] When a user clicks this command button, the client sends an event: {
"type": "m.interaction",
"content": {
"metadata": {
"action": "roll",
"dice_type": "1d6"
},
// reply stuff
}
} I think, personally, I'd love to see a system that sticks more to existing interaction systems in order to avoid extra maintenance burden, and to make it easier to onboard clients. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Bots can GET the previous event if they don't keep state themselves. |
||
// `label` is an extensible event with deliberately no `type`. | ||
"label": { | ||
"m.text": [{"body": "1 six sided die"}] | ||
} | ||
}, { | ||
"type": "preset", | ||
"id": "surprise", | ||
"label": { | ||
// This should render as an image event, hopefully | ||
// Requires https://github.com/matrix-org/matrix-spec-proposals/pull/3552 | ||
"m.text": [{"body": "🎲❓"}], // fallback | ||
"m.file": { | ||
"url": "mxc://example.org/abc123" | ||
}, | ||
"m.image_details": { | ||
// Clients should impose maximums and minimums here. | ||
"width": 16, | ||
"height": 16 | ||
}, | ||
"m.alt_text": { | ||
"m.text": [{"body": "An image of a 6 sided die with a red question mark over it"}] | ||
} | ||
} | ||
}, { | ||
"type": "input", | ||
"id": "custom", | ||
// Regex the client can use to test input locally. Optional - if not provided the client | ||
// should accept *any* input, including an empty string. | ||
"validator": "[0-9]+d[0-9]+", // `2d20`, etc | ||
"label": { | ||
"m.text": [{"body": "Other"}] | ||
} | ||
} | ||
] | ||
} | ||
} | ||
} | ||
``` | ||
|
||
In this example, clients which don't support the mixin will see the old-style `!roll 2d6` help text, | ||
allowing the user to continue interacting if needed. Over time, bots may wish to drop this fallback | ||
style and instead use a message like `Hello! Your client doesn't support talking to me :(`. | ||
|
||
Clients which do support `m.prompts` will instead render the `intro` object as the event. It's not | ||
required that the `intro.type` matches the top level event `type`, though it is considered good | ||
practice to do so. The `intro` block is primarily intended to allow senders to tailor their message | ||
for supported clients, as the intent for this proposal is to discourage commands like `!roll` where | ||
possible. | ||
|
||
Prompts SHOULD be rendered in order of the array, and appear below the `intro` rendering. Buttons | ||
SHOULD be used for `preset` prompts, using the provided `label`, and text inputs with `label` as a | ||
prefix or placeholder, and validation per `validator`, SHOULD be used for `input` prompts. For | ||
example: | ||
|
||
 | ||
|
||
[Codepen](https://codepen.io/turt2live/pen/gOyVvaY) (note: doesn't do validation) | ||
|
||
The user is then able to click on one of the buttons or submit text through the `input` option. That | ||
reply looks as follows: | ||
Comment on lines
+128
to
+129
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What about if there was just a generic button type that if you clicked, would send an event? Even better, why can't other things be clicked as a conversation reply (or an argument to a slash command), such as another user, or another event in the room timeline, or another button from another event in the timeline. Being able to select another button in the timeline as a conversation reply or command argument would unlock a huge door, bots would be able to represent output as different button types. For example, Draupnir would be able to render each individual policy in the response to the equivalent of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. All of this sounds wonderful and very much needed indeed, though the scope for the MSC needs to end somewhere. Expansion is possible through other MSCs currently, and as this one progresses maybe it brings concepts into it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Well, it's my opinion that this would reduce the scope of this MSC and future ones by being able to express them using the "button type primitive" (alternatively presentation type). But I do understand how that can be seen differently. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure I'm following the suggestion in that case, sorry. I understood your comment to mean supporting different types of input, not reusing buttons for everything (which I'm not sure how that'd even work). |
||
|
||
```jsonc | ||
{ | ||
"type": "m.conversation.reply", | ||
"sender": "@alice:example.org", | ||
"content": { | ||
"m.in_reply_to": { // TODO: Change to match Extensible Event replies | ||
"event_id": "$previousMessage", | ||
"rel_type": "m.thread" // yes, we use threads! | ||
}, | ||
// Whichever option the user clicked is described here in a new content block. | ||
"m.used_prompt": { | ||
"id": "surprise" | ||
}, | ||
// We then add all the fallback representations. For `preset` prompts, this is typically just | ||
// the `label` verbatim. `input` prompts may require some creative editing, like "Other: 2d20". | ||
"m.text": [{"body": "🎲❓"}], // fallback for the image | ||
"m.file": { | ||
"url": "mxc://example.org/abc123" | ||
}, | ||
"m.image_details": { | ||
"width": 16, | ||
"height": 16 | ||
}, | ||
"m.alt_text": { | ||
"m.text": [{"body": "An image of a 6 sided die with a red question mark over it"}] | ||
} | ||
} | ||
} | ||
``` | ||
|
||
The bot can then process this and continue the conversation as needed, using more `m.prompts` mixins | ||
to get the information it needs from the user. If the bot considers the conversation/thread to be | ||
complete, it sends an event with no `m.prompts` mixin to the thread. In our example of a dice bot, | ||
this could be the result of the roll. | ||
|
||
Once a user has picked (and sent) a prompt, the client SHOULD disable the user's ability to send | ||
another. This could be done by hiding all options, or using the HTML `disabled` attribute. | ||
|
||
The example dice bot would then start a new conversation by sending a new welcome message, likely | ||
with different text to feel less mechanical. For example: "What are we rolling next? [1d6] [...]". | ||
|
||
It is left as a bot implementation detail to handle multiple responses, responses from descoped | ||
users, and invalid input. Typically this would be handled by the bot using a threaded reply to the | ||
sender saying "sorry, you don't have permission to interact here" or "sorry, I didn't catch that. | ||
[same prompts as original message]". | ||
|
||
## Potential issues | ||
|
||
TODO | ||
|
||
## Alternatives | ||
|
||
[MSC3006](https://github.com/matrix-org/matrix-spec-proposals/pull/3006) is very similar to this | ||
proposal. Instead of starting per-message threads, it defines interactions via a state event. This | ||
makes MSC3006 more akin to a "conversation starter" replacement, to use this MSC's terminology. | ||
|
||
## Security considerations | ||
|
||
TODO | ||
|
||
## Unstable prefix | ||
|
||
While this proposal is not considered stable, clients should use `org.matrix.msc4139.` in place of | ||
`m.` in all identifiers. | ||
|
||
TODO: Language to support usage in room versions without Extensible Events support, similar to | ||
[MSC3381: Polls](https://github.com/matrix-org/matrix-spec-proposals/blob/main/proposals/3381-polls.md). | ||
|
||
## Dependencies | ||
|
||
This MSC has no direct dependencies. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementation requirements:
Note that there is a prerequisite on Extensible Events which needs to be unpicked.