Restrictions on what can go in inline comments #1

bakkot · 2021-10-27T14:00:46Z

There is a base RegularExpressionLiteral grammar which we almost certainly can't change, and I think that will impose some limitations on what characters can be legal unescaped in comments. In particular, I think [ and] might need to be escaped, since the lexer thinks those represent character classes which must balance.

The text was updated successfully, but these errors were encountered:

rbuckton · 2021-10-27T20:34:55Z

I imagine that comments are more likely to be used via the RegExp constructor than in a RegularExpressionLiteral, though I would like to support (?#) in literals. This might require [] be escaped in addition to ()/ in a literal, though escaping for anything other than () is unnecessary in a raw string passed to the RegExp constructor.

It also doesn't seem like it would be difficult to modify RegularExpressionBody to have an additional balancing construct for (?# and ), though it seemed there was reticence to such a change that I don't fully understand.

bakkot · 2021-10-27T20:50:08Z

I suspect @waldemarhorwat and @erights are the best people to speak to any potential issues with modifying RegularExpressionBody.

erights · 2021-12-15T21:59:27Z

Every time there is ambiguity in how to lex/parse JS, we have seen that enable attacks. For example, the ambiguity around html-like comments enabled an attack on Caja. Some approaches to filtering code for some safety relies on an accurate tokenization without requiring a parse. For example, Caja did a conservative scan for all identifiers that might be a use occurrence of a variable. Given accurate tokenization, this is guaranteed to include all use occurrences of free variables. (Caja was actually even more conservative, picking up all identifiers, even in comments and literal strings. But this was expensive. If we had a tokenizers we were confident in, we could have used that.)

There is other old software that relies on being able to tokenize accurately, even if it can no longer parse. For example, some simple syntax highlighters. For example, there is an old macro library for JS that relies on being able to tokenize, followed by being able to bracket match. Bracket matching is a parsing level constraint that has also not been broken by the continuing evolution of JS. In fact, tokenizing template literals requires bracket matching, so stability of tokenization does seem to require the bracket-matching level of parsing as well.

rbuckton · 2021-12-16T00:40:19Z

In plenary today I mistakenly said, "we should ban x-mode for RegExp literals". However, that would not satisfy the constraint here as x-mode does not affect the behavior of (?#). We could merely require that [ and ] be escaped when in a comment in RegularExpressionBody as is proposed in the OP, but they don't necessarily need to be escaped when passed as a string to the RegExp constructor, just as / must be escaped in a RegularExpressionLiteral but does not need to be escaped when passed as a string to the constructor.

waldemarhorwat · 2021-12-23T00:10:21Z

Having the RegularExpressionBody grammar not be a cover grammar (modulo /) for the real regular expression grammar would be terrible.

The invariant is that any valid regular expression according to the RegExp grammar that does not contain an internal / must also be a valid RegularExpressionBody expansion. We carefully engineered the v mode to retain this invariant.

Introducing (?#) comments without the requirement of escaping [ and ] will break the invariant. In /ab(?#[)/i]/m, the regular expression should end at the first / but RegularExpressionBody will also swallow up some of the following characters.

brad4d mentioned this issue Oct 27, 2021

Consider a utility method for building regular expressions with comments instead of an 'x' flag #2

Open

rbuckton mentioned this issue Oct 19, 2022

Multiline Regexp by also ignoring linebreaks #5

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restrictions on what can go in inline comments #1

Restrictions on what can go in inline comments #1

bakkot commented Oct 27, 2021 •

edited

Loading

rbuckton commented Oct 27, 2021

bakkot commented Oct 27, 2021

erights commented Dec 15, 2021

rbuckton commented Dec 16, 2021

waldemarhorwat commented Dec 23, 2021

Restrictions on what can go in inline comments #1

Restrictions on what can go in inline comments #1

Comments

bakkot commented Oct 27, 2021 • edited Loading

rbuckton commented Oct 27, 2021

bakkot commented Oct 27, 2021

erights commented Dec 15, 2021

rbuckton commented Dec 16, 2021

waldemarhorwat commented Dec 23, 2021

bakkot commented Oct 27, 2021 •

edited

Loading