-
Notifications
You must be signed in to change notification settings - Fork 2
Restrictions on what can go in inline comments #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I imagine that comments are more likely to be used via the It also doesn't seem like it would be difficult to modify RegularExpressionBody to have an additional balancing construct for |
I suspect @waldemarhorwat and @erights are the best people to speak to any potential issues with modifying RegularExpressionBody. |
Every time there is ambiguity in how to lex/parse JS, we have seen that enable attacks. For example, the ambiguity around html-like comments enabled an attack on Caja. Some approaches to filtering code for some safety relies on an accurate tokenization without requiring a parse. For example, Caja did a conservative scan for all identifiers that might be a use occurrence of a variable. Given accurate tokenization, this is guaranteed to include all use occurrences of free variables. (Caja was actually even more conservative, picking up all identifiers, even in comments and literal strings. But this was expensive. If we had a tokenizers we were confident in, we could have used that.) There is other old software that relies on being able to tokenize accurately, even if it can no longer parse. For example, some simple syntax highlighters. For example, there is an old macro library for JS that relies on being able to tokenize, followed by being able to bracket match. Bracket matching is a parsing level constraint that has also not been broken by the continuing evolution of JS. In fact, tokenizing template literals requires bracket matching, so stability of tokenization does seem to require the bracket-matching level of parsing as well. |
In plenary today I mistakenly said, "we should ban x-mode for RegExp literals". However, that would not satisfy the constraint here as |
Having the RegularExpressionBody grammar not be a cover grammar (modulo The invariant is that any valid regular expression according to the RegExp grammar that does not contain an internal Introducing |
There is a base
RegularExpressionLiteral
grammar which we almost certainly can't change, and I think that will impose some limitations on what characters can be legal unescaped in comments. In particular, I think[
and]
might need to be escaped, since the lexer thinks those represent character classes which must balance.The text was updated successfully, but these errors were encountered: