From 4f3ee124091b267ce4425530906f9c37dc81b653 Mon Sep 17 00:00:00 2001 From: Kegan Dougal Date: Mon, 22 Dec 2014 10:46:50 +0000 Subject: [PATCH 1/7] Proposal for human ID rules. Includes handling of namespaces for bots, handing of capitalisation, spoof checks and escape sequences. --- drafts/human-id-rules.rst | 141 ++++++++++++++++++++++++++------------ 1 file changed, 99 insertions(+), 42 deletions(-) diff --git a/drafts/human-id-rules.rst b/drafts/human-id-rules.rst index 914a9a42320..b31d188d6d9 100644 --- a/drafts/human-id-rules.rst +++ b/drafts/human-id-rules.rst @@ -1,5 +1,21 @@ This document outlines the format for human-readable IDs within matrix. +Summary +------- + - Human-readable IDs are Room Aliases and User IDs. + - They MUST be Unicode as UTF-8. + - If spoof checks fail, the user ID in question MUST be rewritten to be punycode + with an additional ``@`` prefix. + Room aliases cannot be rewritten. + - Spoof Checks: + - MUST NOT contain one of the 107 blacklisted characters on this list: + http://kb.mozillazine.org/Network.IDN.blacklist_chars + - MUST NOT contain characters from >1 language, defined by + http://cldr.unicode.org/ + - User IDs MUST NOT contain a ``:`` or start with a ``@`` or ``.`` + - Room aliases MUST NOT contain a ``:`` + - User IDs SHOULD be case-insensitive. + Overview -------- UTF-8 is quickly becoming the standard character encoding set on the web. As @@ -10,16 +26,16 @@ identify different users. In addition, there are non-printable characters which cannot be rendered by the end-user. This opens up a security vulnerability with phishing/spoofing of IDs, commonly known as a homograph attack. -Web browers encountered this problem when International Domain Names were +Web browsers encountered this problem when International Domain Names were introduced. A variety of checks were put in place in order to protect users. If an address failed the check, the raw punycode would be displayed to disambiguate the address. Similar checks are performed by home servers in -Matrix. However, Matrix does not use punycode representations, and so does not -show raw punycode on a failed check. Instead, home servers must outright reject -these misleading IDs. +Matrix in order to protect users. In the event of a failed check, the raw +punycode is displayed as the user ID along with a special escape sequence to +indicate the change. Types of human-readable IDs ---------------------------- +~~~~~~~~~~~~~~~~~~~~~~~~~~~ There are two main human-readable IDs in question: - Room aliases @@ -28,54 +44,95 @@ There are two main human-readable IDs in question: Room aliases look like ``#localpart:domain``. These aliases point to opaque non human-readable room IDs. These pointers can change, so there is already an issue present with the same ID pointing to a different destination at a later -date. +date. Checks SHOULD be applied to room aliases, but they cannot be renamed in +punycode as that would break the alias. As a result, the checks in this document +apply to user IDs, although HSes may wish to enforce them on room alias +creation. User IDs look like ``@localpart:domain``. These represent actual end-users, and unlike room aliases, there is no layer of indirection. This presents a much -greater concern with homograph attacks. - -Checks ------- -- Similar to web browsers. -- blacklisted chars (e.g. non-printable characters) -- mix of language sets from 'preferred' language not allowed. -- Language sets from CLDR dataset. -- Treated in segments (localpart, domain) -- Additional restrictions for ease of processing IDs. - - - Room alias localparts MUST NOT have ``#`` or ``:``. - - User ID localparts MUST NOT have ``@`` or ``:``. - -Rejecting ---------- -- Home servers MUST reject room aliases which do not pass the check, both on - GETs and PUTs. -- Home servers MUST reject user ID localparts which do not pass the check, both - on creation and on events. -- Any home server whose domain does not pass this check, MUST use their punycode - domain name instead of the IDN, to prevent other home servers rejecting you. -- Error code is ``M_FAILED_HUMAN_ID_CHECK``. (generic enough for both failing - due to homograph attacks, and failing due to including ``:`` s, etc) -- Error message MAY go into further information about which characters were +greater concern with homograph attacks. Checks MUST be applied to user IDs. + +Spoof Checks +------------ +First, each ID is split into segments (localpart/domain) around the ``:``. For +this reason, ``:`` is a reserved character and cannot be a localpart or domain +character. + +User IDs which start with an ``@`` are used as an escape sequence for failed +user IDs. As a result, the localpart MUST NOT start with an ``@`` in order to +avoid namespace clashes. + +The checks are similar to web browsers for IDNs. The first check is that the +segment MUST NOT contain a blacklisted character on this list: +http://kb.mozillazine.org/Network.IDN.blacklist_chars - NB: Even though +this is Mozilla, Chrome follows the same list as per +http://www.chromium.org/developers/design-documents/idn-in-google-chrome + +The second check is that it MUST NOT contain characters from more than 1 +language. This is defined by this dataset http://cldr.unicode.org/ and is +applied after stripping " 0-9, +, -, [, ], _, and the space character" +( http://www.chromium.org/developers/design-documents/idn-in-google-chrome ) + + +Consequences of a failed check +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +If a user ID fails the check, the user ID on the event is renamed. This is +possible because user IDs contain routing information. This doesn't require +extra work for clients, and users will see an odd user ID rather than a spoofed +name. Renaming is done in order to protect users of a given HS, so if a +malicious HS doesn't rename their IDs, it doesn't affect any other HS. + +- The HS MAY reject the creation of the room alias or user ID. This is the + preferred choice but it is entirely benevolent: other HSes may not apply this + rule so checks on incoming events MUST still be applied. The error code returned + for the rejection is ``M_FAILED_HUMAN_ID_CHECK``, which is generic enough for + both failing due to homograph attacks, and failing due to including ``:`` s. + Error message MAY go into further information about which characters were rejected and why. -- Error message SHOULD contain a ``failed_keys`` key which contains an array - of strings which represent the keys which failed the check e.g:: - failed_keys: [ user_id, room_alias ] +- The HS MUST rename the localpart which failed the check. It SHOULD be + represented as punycode. The HS MUST prefix the punycode with the escape + sequence ``@`` on user ID localparts, e.g. ``@@somepunycode:domain``. Room + aliases do not need to be escaped, and indeed they cannot be, as the originating + HS will not understand the rewritten alias. If a HS renames a user ID, it MUST + be able to apply the reverse mapping in case the user wishes to communicate with + the ID which failed the check. -Other considerations --------------------- -- Basic security: Informational key on the event attached by HS to say "unsafe +Other rejected solutions for failed checks +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +- Additional key: Informational key on the event attached by HS to say "unsafe ID". Problem: clients can just ignore it, and since it will appear only very rarely, easy to forget when implementing clients. -- Moderate security: Requires client handshake. Forces clients to implement +- Require client handshake: Forces clients to implement a check, else they cannot communicate with the misleading ID. However, this is extra overhead in both client implementations and round-trips. -- High security: Outright rejection of the ID at the point of creation / +- Reject event: Outright rejection of the ID at the point of creation / receiving event. Point of creation rejection is preferable to avoid the ID entering the system in the first place. However, malicious HSes can just allow the ID. Hence, other home servers must reject them if they see them in events. Client never sees the problem ID, provided the HS is correctly - implemented. -- High security decided; client doesn't need to worry about it, no additional - protocol complexity aside from rejection of an event. + implemented. However, it is difficult to ensure that ALL HSes will come to the + same conclusion (given the CLDR dataset does come out with new versions). + +Namespacing +----------- + +Bots +~~~~ +User IDs representing real users SHOULD NOT start with a ``.``. User IDs which +act on behalf of a real user (e.g. an IRC/XMPP bot) SHOULD start with a ``.``. +This namespaces real/generated user IDs. Further namespacing SHOULD be applied +based on the service being used, getting progressively more specific, similar to +event types: e.g. ``@.irc.freenode.matrix.:domain``. Ultimately, the +HS in question has control over their user ID namespace, so this is just a +recommendation. + +Additional recommendations +-------------------------- + +Capitalisation +~~~~~~~~~~~~~~ +User IDs SHOULD be case-insensitive. This SHOULD be applied based on the +capitalisation rules in the CLDR dataset: http://cldr.unicode.org/ + From 408a0519ec43c004c821ece05a9febd74b3e2b75 Mon Sep 17 00:00:00 2001 From: Kegsay Date: Mon, 22 Dec 2014 14:39:16 +0000 Subject: [PATCH 2/7] Update human-id-rules.rst Clarify position on capitalisation. --- drafts/human-id-rules.rst | 30 +++++++++++++++++------------- 1 file changed, 17 insertions(+), 13 deletions(-) diff --git a/drafts/human-id-rules.rst b/drafts/human-id-rules.rst index b31d188d6d9..f28a5a6ab56 100644 --- a/drafts/human-id-rules.rst +++ b/drafts/human-id-rules.rst @@ -2,19 +2,19 @@ This document outlines the format for human-readable IDs within matrix. Summary ------- - - Human-readable IDs are Room Aliases and User IDs. - - They MUST be Unicode as UTF-8. - - If spoof checks fail, the user ID in question MUST be rewritten to be punycode - with an additional ``@`` prefix. - Room aliases cannot be rewritten. - - Spoof Checks: - - MUST NOT contain one of the 107 blacklisted characters on this list: - http://kb.mozillazine.org/Network.IDN.blacklist_chars - - MUST NOT contain characters from >1 language, defined by - http://cldr.unicode.org/ - - User IDs MUST NOT contain a ``:`` or start with a ``@`` or ``.`` - - Room aliases MUST NOT contain a ``:`` - - User IDs SHOULD be case-insensitive. +- Human-readable IDs are Room Aliases and User IDs. +- They MUST be Unicode as UTF-8. +- If spoof checks fail, the user ID in question MUST be rewritten to be punycode + with an additional ``@`` prefix. + Room aliases cannot be rewritten. +- Spoof Checks: + - MUST NOT contain one of the 107 blacklisted characters on this list: + http://kb.mozillazine.org/Network.IDN.blacklist_chars + - MUST NOT contain characters from >1 language, defined by + http://cldr.unicode.org/ +- User IDs MUST NOT contain a ``:`` or start with a ``@`` or ``.`` +- Room aliases MUST NOT contain a ``:`` +- User IDs SHOULD be case-insensitive. Overview -------- @@ -136,3 +136,7 @@ Capitalisation User IDs SHOULD be case-insensitive. This SHOULD be applied based on the capitalisation rules in the CLDR dataset: http://cldr.unicode.org/ +This check SHOULD be applied when the user ID is created, in order to prevent +registration with the same name and different capitalisations, e.g. +``@foo:bar`` vs ``@Foo:bar`` vs ``@FOO:bar``. + From f2422eae3f2a76071fb35d9643d39e95c4275848 Mon Sep 17 00:00:00 2001 From: Kegsay Date: Mon, 22 Dec 2014 14:41:32 +0000 Subject: [PATCH 3/7] Update human-id-rules.rst Moar clarify. --- drafts/human-id-rules.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drafts/human-id-rules.rst b/drafts/human-id-rules.rst index f28a5a6ab56..fd20ea81545 100644 --- a/drafts/human-id-rules.rst +++ b/drafts/human-id-rules.rst @@ -133,7 +133,7 @@ Additional recommendations Capitalisation ~~~~~~~~~~~~~~ -User IDs SHOULD be case-insensitive. This SHOULD be applied based on the +The home server SHOULD NOT allow two user IDs that differ only by case. This SHOULD be applied based on the capitalisation rules in the CLDR dataset: http://cldr.unicode.org/ This check SHOULD be applied when the user ID is created, in order to prevent From 37a7f2108e766cd89e84ac6c8285036de413e496 Mon Sep 17 00:00:00 2001 From: Kegsay Date: Mon, 22 Dec 2014 14:48:28 +0000 Subject: [PATCH 4/7] Update human-id-rules.rst Mention case canonicalisation on registration. --- drafts/human-id-rules.rst | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drafts/human-id-rules.rst b/drafts/human-id-rules.rst index fd20ea81545..3fc5852039b 100644 --- a/drafts/human-id-rules.rst +++ b/drafts/human-id-rules.rst @@ -138,5 +138,6 @@ capitalisation rules in the CLDR dataset: http://cldr.unicode.org/ This check SHOULD be applied when the user ID is created, in order to prevent registration with the same name and different capitalisations, e.g. -``@foo:bar`` vs ``@Foo:bar`` vs ``@FOO:bar``. +``@foo:bar`` vs ``@Foo:bar`` vs ``@FOO:bar``. Home servers MAY canonicalise +the user ID to be completely lower-case if desired. From 3d5ec5eb15c5574c9fc155bf5ed5a5c1083c1345 Mon Sep 17 00:00:00 2001 From: Kegan Dougal Date: Tue, 13 Oct 2015 15:09:00 +0100 Subject: [PATCH 5/7] Updated to reflect more recent progress --- drafts/human-id-rules.rst | 203 +++++++++++++++++--------------------- 1 file changed, 93 insertions(+), 110 deletions(-) diff --git a/drafts/human-id-rules.rst b/drafts/human-id-rules.rst index 3fc5852039b..b3178ee39b0 100644 --- a/drafts/human-id-rules.rst +++ b/drafts/human-id-rules.rst @@ -1,103 +1,101 @@ -This document outlines the format for human-readable IDs within matrix. - -Summary -------- -- Human-readable IDs are Room Aliases and User IDs. -- They MUST be Unicode as UTF-8. -- If spoof checks fail, the user ID in question MUST be rewritten to be punycode - with an additional ``@`` prefix. - Room aliases cannot be rewritten. -- Spoof Checks: - - MUST NOT contain one of the 107 blacklisted characters on this list: - http://kb.mozillazine.org/Network.IDN.blacklist_chars - - MUST NOT contain characters from >1 language, defined by - http://cldr.unicode.org/ -- User IDs MUST NOT contain a ``:`` or start with a ``@`` or ``.`` -- Room aliases MUST NOT contain a ``:`` -- User IDs SHOULD be case-insensitive. - -Overview --------- -UTF-8 is quickly becoming the standard character encoding set on the web. As -such, Matrix requires that all strings MUST be encoded as UTF-8. However, +Abstract +======== + +This document outlines the format for human-readable IDs within Matrix. + +Background +---------- +UTF-8 is the dominant character encoding for Unicode on the web. However, using Unicode as the character set for human-readable IDs is troublesome. There are many different characters which appear identical to each other, but would -identify different users. In addition, there are non-printable characters which -cannot be rendered by the end-user. This opens up a security vulnerability with +produce different IDs. In addition, there are non-printable characters which +cannot be rendered by the end-user. This creates an opportunity for phishing/spoofing of IDs, commonly known as a homograph attack. Web browsers encountered this problem when International Domain Names were introduced. A variety of checks were put in place in order to protect users. If an address failed the check, the raw punycode would be displayed to -disambiguate the address. Similar checks are performed by home servers in -Matrix in order to protect users. In the event of a failed check, the raw -punycode is displayed as the user ID along with a special escape sequence to -indicate the change. +disambiguate the address. -Types of human-readable IDs -~~~~~~~~~~~~~~~~~~~~~~~~~~~ -There are two main human-readable IDs in question: +The human-readable IDs in Matrix are Room Aliases and User IDs. +Room aliases look like ``#localpart:domain``. These aliases point to opaque +non human-readable room IDs. These pointers can change to point at a different +room ID at any time. User IDs look like ``@localpart:domain``. These represent +actual end-users (there is no indirection). -- Room aliases -- User IDs +Proposal +======== -Room aliases look like ``#localpart:domain``. These aliases point to opaque -non human-readable room IDs. These pointers can change, so there is already an -issue present with the same ID pointing to a different destination at a later -date. Checks SHOULD be applied to room aliases, but they cannot be renamed in -punycode as that would break the alias. As a result, the checks in this document -apply to user IDs, although HSes may wish to enforce them on room alias -creation. - -User IDs look like ``@localpart:domain``. These represent actual end-users, and -unlike room aliases, there is no layer of indirection. This presents a much -greater concern with homograph attacks. Checks MUST be applied to user IDs. - -Spoof Checks ------------- -First, each ID is split into segments (localpart/domain) around the ``:``. For -this reason, ``:`` is a reserved character and cannot be a localpart or domain -character. - -User IDs which start with an ``@`` are used as an escape sequence for failed -user IDs. As a result, the localpart MUST NOT start with an ``@`` in order to -avoid namespace clashes. - -The checks are similar to web browsers for IDNs. The first check is that the -segment MUST NOT contain a blacklisted character on this list: -http://kb.mozillazine.org/Network.IDN.blacklist_chars - NB: Even though -this is Mozilla, Chrome follows the same list as per -http://www.chromium.org/developers/design-documents/idn-in-google-chrome - -The second check is that it MUST NOT contain characters from more than 1 -language. This is defined by this dataset http://cldr.unicode.org/ and is -applied after stripping " 0-9, +, -, [, ], _, and the space character" -( http://www.chromium.org/developers/design-documents/idn-in-google-chrome ) - - -Consequences of a failed check -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -If a user ID fails the check, the user ID on the event is renamed. This is -possible because user IDs contain routing information. This doesn't require -extra work for clients, and users will see an odd user ID rather than a spoofed -name. Renaming is done in order to protect users of a given HS, so if a +User IDs and Room Aliases MUST be Unicode as UTF-8. Checks are performed on +these IDs by homeservers to protect users from phishing/spoofing attacks. +These checks are: + +User ID Localparts: + - MUST NOT contain a ``:`` or start with a ``@`` or ``.`` + - MUST NOT contain one of the 107 blacklisted characters on this list: + http://kb.mozillazine.org/Network.IDN.blacklist_chars + - After stripping " 0-9, +, -, [, ], _, and the space character it MUST NOT + contain characters from >1 language, defined by http://cldr.unicode.org/ + +Room Alias Localparts: + - MUST NOT contain a ``:`` + - MUST NOT contain one of the 107 blacklisted characters on this list: + http://kb.mozillazine.org/Network.IDN.blacklist_chars + - After stripping " 0-9, +, -, [, ], _, and the space character it MUST NOT + contain characters from >1 language, defined by http://cldr.unicode.org/ + + +In the event of a failed user ID check, well behaved homeservers MUST: +- Rewrite user IDs in the offending events to be punycode with an additional ``@`` + prefix **before** delivering them to clients. There are no guarantees for + consistency between homeserver ID checking implementations. As a result, user + IDs MUST be sent in their *original* form over federation. This can be done in + a stateless manner as the punycode form has no information loss. + +In the event of a failed room alias check, well behaved homeservers MUST: +- Send an HTTP status code 400 with an ``errcode`` of ``M_FAILED_HUMAN_ID_CHECK`` + to the client if the client is attempting to *create* this alias. +- Send an HTTP status code 400 with an ``errcode`` of ``M_FAILED_HUMAN_ID_CHECK`` + to the client if the client is attempting to *join* a room via this alias. + +Examples:: + + @ebаy:domain.com (Cyrillic 'a', everything else English) + @@xn--eby-7cd:domain.com (Punycode with additional '@') + +Homeservers SHOULD NOT allow two user IDs that differ only by case. This +SHOULD be applied based on the capitalisation rules in the CLDR dataset: +http://cldr.unicode.org/ + +This check SHOULD be applied when the user ID is created, in order to prevent +registration with the same name and different capitalisations, e.g. +``@foo:bar`` vs ``@Foo:bar`` vs ``@FOO:bar``. Home servers MAY canonicalise +the user ID to be completely lower-case if desired. + +Rationale +========= + +Each ID is split into segments (localpart/domain) around the ``:``. For +this reason, ``:`` is a reserved character and cannot be a localpart character. +The 107 blacklisted characters are used to prevent non-printable characters and +spaces from being used. The decision to ban characters from more than 1 language +matches the behaviour of Google Chrome for IDN handling. This is to protect +against common homograph attacks such as ebаy.com (Cyrillic "a", rest is +English). This would always result in a failed check. Even with this though +there are limitations. For example, сахар is entirely Cyrillic, whereas caxap is +entirely Latin. + +User ID localparts cannot start with ``@`` so that a namespace of localparts +beginning with ``@`` can be created. This namespace is used for user IDs which +fail the ID checks. A failed ID could look like ``@@xn--c1yn36f:domain.com``. + +If a user ID fails the check, the user ID on the event is renamed. This doesn't +require extra work for clients, and users will see an odd user ID rather than a +spoofed name. Renaming is done in order to protect users of a given HS, so if a malicious HS doesn't rename their IDs, it doesn't affect any other HS. -- The HS MAY reject the creation of the room alias or user ID. This is the - preferred choice but it is entirely benevolent: other HSes may not apply this - rule so checks on incoming events MUST still be applied. The error code returned - for the rejection is ``M_FAILED_HUMAN_ID_CHECK``, which is generic enough for - both failing due to homograph attacks, and failing due to including ``:`` s. - Error message MAY go into further information about which characters were - rejected and why. - -- The HS MUST rename the localpart which failed the check. It SHOULD be - represented as punycode. The HS MUST prefix the punycode with the escape - sequence ``@`` on user ID localparts, e.g. ``@@somepunycode:domain``. Room - aliases do not need to be escaped, and indeed they cannot be, as the originating - HS will not understand the rewritten alias. If a HS renames a user ID, it MUST - be able to apply the reverse mapping in case the user wishes to communicate with - the ID which failed the check. +Room aliases cannot be rewritten as punycode and sent to the HS the alias is +referring to as the HS will not necessarily understand the rewritten alias. Other rejected solutions for failed checks ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -115,29 +113,14 @@ Other rejected solutions for failed checks implemented. However, it is difficult to ensure that ALL HSes will come to the same conclusion (given the CLDR dataset does come out with new versions). -Namespacing ------------ - -Bots -~~~~ -User IDs representing real users SHOULD NOT start with a ``.``. User IDs which -act on behalf of a real user (e.g. an IRC/XMPP bot) SHOULD start with a ``.``. -This namespaces real/generated user IDs. Further namespacing SHOULD be applied -based on the service being used, getting progressively more specific, similar to -event types: e.g. ``@.irc.freenode.matrix.:domain``. Ultimately, the -HS in question has control over their user ID namespace, so this is just a -recommendation. - -Additional recommendations --------------------------- +Outstanding Problems +==================== Capitalisation -~~~~~~~~~~~~~~ -The home server SHOULD NOT allow two user IDs that differ only by case. This SHOULD be applied based on the -capitalisation rules in the CLDR dataset: http://cldr.unicode.org/ +-------------- -This check SHOULD be applied when the user ID is created, in order to prevent -registration with the same name and different capitalisations, e.g. -``@foo:bar`` vs ``@Foo:bar`` vs ``@FOO:bar``. Home servers MAY canonicalise -the user ID to be completely lower-case if desired. +The capitalisation rules outlined above are nice but do not fully resolve issues +where ``@alice:example.com`` tries to speak with ``@bob:domain.com`` using +``@Bob:domain.com``. It is up to ``domain.com`` to map ``Bob`` to ``bob`` in +a sensible way. From 0ab2d66ae2f9047f38e56c7336bbb3b98c102e87 Mon Sep 17 00:00:00 2001 From: Kegsay Date: Tue, 13 Oct 2015 15:11:17 +0100 Subject: [PATCH 6/7] Make it valid RST --- drafts/human-id-rules.rst | 21 ++++++++++----------- 1 file changed, 10 insertions(+), 11 deletions(-) diff --git a/drafts/human-id-rules.rst b/drafts/human-id-rules.rst index b3178ee39b0..3da575bc340 100644 --- a/drafts/human-id-rules.rst +++ b/drafts/human-id-rules.rst @@ -46,17 +46,17 @@ Room Alias Localparts: In the event of a failed user ID check, well behaved homeservers MUST: -- Rewrite user IDs in the offending events to be punycode with an additional ``@`` - prefix **before** delivering them to clients. There are no guarantees for - consistency between homeserver ID checking implementations. As a result, user - IDs MUST be sent in their *original* form over federation. This can be done in - a stateless manner as the punycode form has no information loss. + - Rewrite user IDs in the offending events to be punycode with an additional ``@`` + prefix **before** delivering them to clients. There are no guarantees for + consistency between homeserver ID checking implementations. As a result, user + IDs MUST be sent in their *original* form over federation. This can be done in + a stateless manner as the punycode form has no information loss. In the event of a failed room alias check, well behaved homeservers MUST: -- Send an HTTP status code 400 with an ``errcode`` of ``M_FAILED_HUMAN_ID_CHECK`` - to the client if the client is attempting to *create* this alias. -- Send an HTTP status code 400 with an ``errcode`` of ``M_FAILED_HUMAN_ID_CHECK`` - to the client if the client is attempting to *join* a room via this alias. + - Send an HTTP status code 400 with an ``errcode`` of ``M_FAILED_HUMAN_ID_CHECK`` + to the client if the client is attempting to *create* this alias. + - Send an HTTP status code 400 with an ``errcode`` of ``M_FAILED_HUMAN_ID_CHECK`` + to the client if the client is attempting to *join* a room via this alias. Examples:: @@ -98,7 +98,7 @@ Room aliases cannot be rewritten as punycode and sent to the HS the alias is referring to as the HS will not necessarily understand the rewritten alias. Other rejected solutions for failed checks -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +------------------------------------------ - Additional key: Informational key on the event attached by HS to say "unsafe ID". Problem: clients can just ignore it, and since it will appear only very rarely, easy to forget when implementing clients. @@ -123,4 +123,3 @@ The capitalisation rules outlined above are nice but do not fully resolve issues where ``@alice:example.com`` tries to speak with ``@bob:domain.com`` using ``@Bob:domain.com``. It is up to ``domain.com`` to map ``Bob`` to ``bob`` in a sensible way. - From ee3fe989ca2fde6d968aba6e90013143e27ecc42 Mon Sep 17 00:00:00 2001 From: Kegsay Date: Tue, 13 Oct 2015 15:47:56 +0100 Subject: [PATCH 7/7] Linkify --- drafts/human-id-rules.rst | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/drafts/human-id-rules.rst b/drafts/human-id-rules.rst index 3da575bc340..3eb1cbee09f 100644 --- a/drafts/human-id-rules.rst +++ b/drafts/human-id-rules.rst @@ -35,15 +35,20 @@ User ID Localparts: - MUST NOT contain one of the 107 blacklisted characters on this list: http://kb.mozillazine.org/Network.IDN.blacklist_chars - After stripping " 0-9, +, -, [, ], _, and the space character it MUST NOT - contain characters from >1 language, defined by http://cldr.unicode.org/ + contain characters from >1 language, defined by the `exemplar characters`_ + on http://cldr.unicode.org/ + +.. _exemplar characters: http://cldr.unicode.org/translation/characters#TOC-Exemplar-Characters Room Alias Localparts: - MUST NOT contain a ``:`` - MUST NOT contain one of the 107 blacklisted characters on this list: http://kb.mozillazine.org/Network.IDN.blacklist_chars - After stripping " 0-9, +, -, [, ], _, and the space character it MUST NOT - contain characters from >1 language, defined by http://cldr.unicode.org/ + contain characters from >1 language, defined by the `exemplar characters`_ + on http://cldr.unicode.org/ +.. _exemplar characters: http://cldr.unicode.org/translation/characters#TOC-Exemplar-Characters In the event of a failed user ID check, well behaved homeservers MUST: - Rewrite user IDs in the offending events to be punycode with an additional ``@`` @@ -79,11 +84,13 @@ Each ID is split into segments (localpart/domain) around the ``:``. For this reason, ``:`` is a reserved character and cannot be a localpart character. The 107 blacklisted characters are used to prevent non-printable characters and spaces from being used. The decision to ban characters from more than 1 language -matches the behaviour of Google Chrome for IDN handling. This is to protect +matches the behaviour of `Google Chrome for IDN handling`_. This is to protect against common homograph attacks such as ebаy.com (Cyrillic "a", rest is English). This would always result in a failed check. Even with this though there are limitations. For example, сахар is entirely Cyrillic, whereas caxap is -entirely Latin. +entirely Latin. + +.. _Google Chrome for IDN handling: https://www.chromium.org/developers/design-documents/idn-in-google-chrome User ID localparts cannot start with ``@`` so that a namespace of localparts beginning with ``@`` can be created. This namespace is used for user IDs which