Fix #2516, #3560: Unicode space handling #3774

lydell · 2015-01-06T20:50:01Z

It is possible to match only valid JavaScript identifiers with a really long
regex (like coco and CoffeeScriptRedux does), but CoffeeScript uses a much
simpler one, which allows a bit too much.

Quoting #1718 #issuecomment-2152464 @jashkenas:

But it still seems very much across the "worth it" line. You'll get the
SyntaxError as soon as it hits JS, and performance aside -- even the increase
in filesize for our browser coffee-script.js lib seems too much, considering
this is something no one ever does, apart from experimentation.

In short, CoffeeScript treats any non-ASCII character as part of an identifier.
However, unicode spaces should be excluded since having blank characters as part
of a word is very confusing. This commit does so, while still keeping the
regex really simple.

@jashkenas

It is possible to match only valid JavaScript identifiers with a really long regex (like coco and CoffeeScriptRedux does), but CoffeeScript uses a much simpler one, which allows a bit too much. Quoting jashkenas#1718 #issuecomment-2152464 @jashkenas: > But it still seems very much across the "worth it" line. You'll get the > SyntaxError as soon as it hits JS, and performance aside -- even the increase > in filesize for our browser coffee-script.js lib seems too much, considering > this is something no one ever does, apart from experimentation. In short, CoffeeScript treats any non-ASCII character as part of an identifier. However, unicode spaces should be excluded since having blank characters as part of a _word_ is very confusing. This commit does so, while still keeping the regex really simple.

jashkenas · 2015-01-06T21:10:57Z

Sounds alright. I guess you're imagining some really bizarre copy-and-paste scenarios?

Fix #2516, #3560: Unicode space handling

lydell · 2015-01-06T21:19:21Z

Yes. Most importantly the “I accidentally pressed AltGr+Space which inserted a non-breaking space” scenario, though.

michaelficarra · 2015-01-07T02:59:48Z

👍

loveencounterflow · 2015-01-15T18:39:58Z

+1. BTW it also happened to me when testing out some Chinese character processing stuff and not switching the keyboard back to Western before editing outside a string literal; you then get U+3000 ideographic spaces with the spacebar; i'd say there quite a few whitespace characters out there you don't want to see except in a string (i'm +1 for outruling tabs as indentation, too).

jashkenas added a commit that referenced this pull request Jan 6, 2015

Merge pull request #3774 from lydell/unicode-spaces

e769423

Fix #2516, #3560: Unicode space handling

jashkenas merged commit e769423 into jashkenas:master Jan 6, 2015

jashkenas added change fixed labels Jan 6, 2015

lydell deleted the unicode-spaces branch January 6, 2015 21:19

lydell mentioned this pull request Jan 15, 2015

compilation error with attribute access #3560

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix #2516, #3560: Unicode space handling #3774

Fix #2516, #3560: Unicode space handling #3774

lydell commented Jan 6, 2015

jashkenas commented Jan 6, 2015

lydell commented Jan 6, 2015

michaelficarra commented Jan 7, 2015

loveencounterflow commented Jan 15, 2015

Fix #2516, #3560: Unicode space handling #3774

Fix #2516, #3560: Unicode space handling #3774

Conversation

lydell commented Jan 6, 2015

jashkenas commented Jan 6, 2015

lydell commented Jan 6, 2015

michaelficarra commented Jan 7, 2015

loveencounterflow commented Jan 15, 2015