Skip to content

MSC4139: Bot buttons & conversations #4139

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
201 changes: 201 additions & 0 deletions proposals/4139-bot-buttons.md
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation requirements:

  • Client rendering prompts
  • Bot sending prompts

Note that there is a prerequisite on Extensible Events which needs to be unpicked.

Original file line number Diff line number Diff line change
@@ -0,0 +1,201 @@
# MSC4139: Bot buttons & conversations

Nearly all bots and bridges in the Matrix ecosystem use a text-based interface to support their
operations. These interfaces are typically highly structured commands and require the user to know
the entire incantation for the action they want to invoke, making them feel like "power user"
features.

Further, interacting with bots today is extremely transactional: the user sends a command and the
bot performs the action as-is or spews errors back at the user due to a typo. If an error was
returned, the entire command needs to be re-run.

A more user-friendly approach is to have the user provide the bot with information as needed,
without having to guess at the bot's current state. This proposal calls such an approach a
"conversation" with the bot - the user does something to "start" the conversation, and the bot
provides a limited set of prompts to continue the conversation. This repeats until the conversation
ends (usually by the bot saying so explicitly). Users may hold multiple concurrent conversations
with bots. Conversation starters are deliberately left as a bot implementation detail in this
proposal to allow the ecosystem to explore this new interaction technique. Examples may include the
user opening a DM with the bot, sending a `!command` message, or, in future, sending a slash command
like `/start`.

This conversation approach is heavily inspired by platforms like Telegram.

## Proposal

A new `m.prompts` [mixin](https://github.com/matrix-org/matrix-spec-proposals/blob/main/proposals/1767-extensible-events.md#mixins-specifically-allowed)
is specified which describes actions another user in the room can take to further the conversation.

The `m.prompts` mixin contains some scoping parameters, rendering hints, and the actual prompts
themselves. For example, when applied to an `m.message` event, the `m.prompts` may look like the
following:

*Note*: The JSON comments are normative, and irrelevant fields are not shown.

```jsonc
{
"type": "m.message",
"sender": "@bot:example.org",
"content": {
"m.text": [
{"body": "Hello! Say <code>!roll [dice]</code> to roll some dice.", "mimetype": "text/html"},
{"body": "Hello! Say `!roll [dice]` to get started."}
],
"m.prompts": {
// Clients which recognize `m.prompts` would use `intro` to render the event instead. This
// allows the remainder of the event to be a fallback for unsupported clients.
"intro": {
"type": "m.message",
"content": {
"m.text": [
{"body": "Hello! What would you like to roll today?"}
]
}
},
// These are the users who should see the `prompts`. Other users may see something like "you
// do not have permission to reply to this message" instead of prompts. `scope` is optional:
// when not supplied, all users who can see the message can respond. When an empty array, no
// one can respond. Clients SHOULD NOT show prompts to users who are descoped.
"scope": [
"@alice:example.org",
"@bob:example.org",
],
Comment on lines +55 to +62
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not needed but it would be really nice to generalise this with a whisper MSC.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent here is to very much ensure interactions are visible to the room. Bots looking to whisper the user can utilize DMs for now.

// These are the options a user has. Note the 2 distinct types and 3 label approaches.
"prompts": [
{
// `type` is the prompt type: "preset" (show a button) or "input" (shown below)
"type": "preset",
// `id` is used by the bot to figure out what prompt the user picked. It is an opaque ID.
"id": "1d6",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to indicate that interactions are stateful?

I'd been thinking about a similar-ish system, but making the client just send a m.interaction event with a bot-defined content (possibly as a sub-key?)

eg. a bot would define something like

// other content stuffs ...
"interactions": [
  {
    "type": "command",
    "metadata": {
      "action": "roll",
      "dice_type": "1d6"
    }
]

When a user clicks this command button, the client sends an event:

{
  "type": "m.interaction",
  "content": {
     "metadata": {
        "action": "roll",
        "dice_type": "1d6"
     }, 
     // reply stuff
  }
}

I think, personally, I'd love to see a system that sticks more to existing interaction systems in order to avoid extra maintenance burden, and to make it easier to onboard clients.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bots can GET the previous event if they don't keep state themselves.

// `label` is an extensible event with deliberately no `type`.
"label": {
"m.text": [{"body": "1 six sided die"}]
}
}, {
"type": "preset",
"id": "surprise",
"label": {
// This should render as an image event, hopefully
// Requires https://github.com/matrix-org/matrix-spec-proposals/pull/3552
"m.text": [{"body": "🎲❓"}], // fallback
"m.file": {
"url": "mxc://example.org/abc123"
},
"m.image_details": {
// Clients should impose maximums and minimums here.
"width": 16,
"height": 16
},
"m.alt_text": {
"m.text": [{"body": "An image of a 6 sided die with a red question mark over it"}]
}
}
}, {
"type": "input",
"id": "custom",
// Regex the client can use to test input locally. Optional - if not provided the client
// should accept *any* input, including an empty string.
"validator": "[0-9]+d[0-9]+", // `2d20`, etc
"label": {
"m.text": [{"body": "Other"}]
}
}
]
}
}
}
```

In this example, clients which don't support the mixin will see the old-style `!roll 2d6` help text,
allowing the user to continue interacting if needed. Over time, bots may wish to drop this fallback
style and instead use a message like `Hello! Your client doesn't support talking to me :(`.

Clients which do support `m.prompts` will instead render the `intro` object as the event. It's not
required that the `intro.type` matches the top level event `type`, though it is considered good
practice to do so. The `intro` block is primarily intended to allow senders to tailor their message
for supported clients, as the intent for this proposal is to discourage commands like `!roll` where
possible.

Prompts SHOULD be rendered in order of the array, and appear below the `intro` rendering. Buttons
SHOULD be used for `preset` prompts, using the provided `label`, and text inputs with `label` as a
prefix or placeholder, and validation per `validator`, SHOULD be used for `input` prompts. For
example:

![](./images/4139-01-dice-bot-welcome.png)

[Codepen](https://codepen.io/turt2live/pen/gOyVvaY) (note: doesn't do validation)

The user is then able to click on one of the buttons or submit text through the `input` option. That
reply looks as follows:
Comment on lines +128 to +129
Copy link
Contributor

@Gnuxie Gnuxie May 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about if there was just a generic button type that if you clicked, would send an event?

Even better, why can't other things be clicked as a conversation reply (or an argument to a slash command), such as another user, or another event in the room timeline, or another button from another event in the timeline.

Being able to select another button in the timeline as a conversation reply or command argument would unlock a huge door, bots would be able to represent output as different button types. For example, Draupnir would be able to render each individual policy in the response to the equivalent of !rules matching @spam:example.com as a button, that could be selected as an argument in future commands or interactions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of this sounds wonderful and very much needed indeed, though the scope for the MSC needs to end somewhere. Expansion is possible through other MSCs currently, and as this one progresses maybe it brings concepts into it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it's my opinion that this would reduce the scope of this MSC and future ones by being able to express them using the "button type primitive" (alternatively presentation type). But I do understand how that can be seen differently.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I'm following the suggestion in that case, sorry. I understood your comment to mean supporting different types of input, not reusing buttons for everything (which I'm not sure how that'd even work).


```jsonc
{
"type": "m.conversation.reply",
"sender": "@alice:example.org",
"content": {
"m.in_reply_to": { // TODO: Change to match Extensible Event replies
"event_id": "$previousMessage",
"rel_type": "m.thread" // yes, we use threads!
},
// Whichever option the user clicked is described here in a new content block.
"m.used_prompt": {
"id": "surprise"
},
// We then add all the fallback representations. For `preset` prompts, this is typically just
// the `label` verbatim. `input` prompts may require some creative editing, like "Other: 2d20".
"m.text": [{"body": "🎲❓"}], // fallback for the image
"m.file": {
"url": "mxc://example.org/abc123"
},
"m.image_details": {
"width": 16,
"height": 16
},
"m.alt_text": {
"m.text": [{"body": "An image of a 6 sided die with a red question mark over it"}]
}
}
}
```

The bot can then process this and continue the conversation as needed, using more `m.prompts` mixins
to get the information it needs from the user. If the bot considers the conversation/thread to be
complete, it sends an event with no `m.prompts` mixin to the thread. In our example of a dice bot,
this could be the result of the roll.

Once a user has picked (and sent) a prompt, the client SHOULD disable the user's ability to send
another. This could be done by hiding all options, or using the HTML `disabled` attribute.

The example dice bot would then start a new conversation by sending a new welcome message, likely
with different text to feel less mechanical. For example: "What are we rolling next? [1d6] [...]".

It is left as a bot implementation detail to handle multiple responses, responses from descoped
users, and invalid input. Typically this would be handled by the bot using a threaded reply to the
sender saying "sorry, you don't have permission to interact here" or "sorry, I didn't catch that.
[same prompts as original message]".

## Potential issues

TODO

## Alternatives

[MSC3006](https://github.com/matrix-org/matrix-spec-proposals/pull/3006) is very similar to this
proposal. Instead of starting per-message threads, it defines interactions via a state event. This
makes MSC3006 more akin to a "conversation starter" replacement, to use this MSC's terminology.

## Security considerations

TODO

## Unstable prefix

While this proposal is not considered stable, clients should use `org.matrix.msc4139.` in place of
`m.` in all identifiers.

TODO: Language to support usage in room versions without Extensible Events support, similar to
[MSC3381: Polls](https://github.com/matrix-org/matrix-spec-proposals/blob/main/proposals/3381-polls.md).

## Dependencies

This MSC has no direct dependencies.
Binary file added proposals/images/4139-01-dice-bot-welcome.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.