Add more Unicode planes to regular_char #174

stasm · 2018-08-28T12:30:19Z

Let's extend the regular_char production to be more permissive of characters from outside of BMP. A good standard to follow is https://www.w3.org/TR/REC-xml/#NT-Char.

fluent/syntax/grammar.mjs

Lines 411 to 417 in 3c7cd30

    
           /* Any Unicode character from BMP excluding C0 control characters, space, 
        
            * surrogate blocks and non-characters (U+FFFE, U+FFFF). 
        
            * Cf. https://www.w3.org/TR/REC-xml/#NT-Char 
        
            * TODO Add characters from other planes: U+10000 to U+10FFFF. 
        
            */ 
        
           let regular_char = 
        
               charset("\u0021-\uD7FF\uE000-\uFFFD");

The text was updated successfully, but these errors were encountered:

stasm · 2018-10-12T11:39:04Z

Extending the reference parser to support astral Unicode planes turned out to be easy thanks to Unicode-aware regexes in ES2015. I opened #179 with the proposed implementation.

stasm · 2018-10-12T11:40:44Z

The definition of NT-Char in the XML spec comes with the following note:

Note:

    Document authors are encouraged to avoid "compatibility characters",
    as defined in section 2.3 of [Unicode]. The characters defined in the
    following ranges are also discouraged. They are either control characters
    or permanently undefined Unicode characters:

    [#x7F-#x84], [#x86-#x9F], [#xFDD0-#xFDEF],
    [#x1FFFE-#x1FFFF], [#x2FFFE-#x2FFFF], [#x3FFFE-#x3FFFF],
    [#x4FFFE-#x4FFFF], [#x5FFFE-#x5FFFF], [#x6FFFE-#x6FFFF],
    [#x7FFFE-#x7FFFF], [#x8FFFE-#x8FFFF], [#x9FFFE-#x9FFFF],
    [#xAFFFE-#xAFFFF], [#xBFFFE-#xBFFFF], [#xCFFFE-#xCFFFF],
    [#xDFFFE-#xDFFFF], [#xEFFFE-#xEFFFF], [#xFFFFE-#xFFFFF],
    [#x10FFFE-#x10FFFF].

Should we include something similar in the Fluent spec?

stasm mentioned this issue Aug 28, 2018

Remove backslash escapes from TextElement #123

Closed

stasm mentioned this issue Oct 12, 2018

Support astral Unicode characters in TextElements and StringLiteral #179

Merged

stasm closed this as completed in #179 Oct 16, 2018

stasm mentioned this issue Oct 16, 2018

Forbid C1 control chars in any_char #182

Closed

snyk-bot mentioned this issue Oct 26, 2019

[Snyk] Upgrade fluent from 0.6.4 to 0.12.0 ajesse11x/send#4

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add more Unicode planes to regular_char #174

Add more Unicode planes to regular_char #174

stasm commented Aug 28, 2018

stasm commented Oct 12, 2018

Uh oh!

stasm commented Oct 12, 2018

Uh oh!

Add more Unicode planes to regular_char #174

Add more Unicode planes to regular_char #174

Comments

stasm commented Aug 28, 2018

stasm commented Oct 12, 2018

Uh oh!

stasm commented Oct 12, 2018

Uh oh!