Skip to content

Fix #2516, #3560: Unicode space handling #3774

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 6, 2015

Conversation

lydell
Copy link
Collaborator

@lydell lydell commented Jan 6, 2015

It is possible to match only valid JavaScript identifiers with a really long
regex (like coco and CoffeeScriptRedux does), but CoffeeScript uses a much
simpler one, which allows a bit too much.

Quoting #1718 #issuecomment-2152464 @jashkenas:

But it still seems very much across the "worth it" line. You'll get the
SyntaxError as soon as it hits JS, and performance aside -- even the increase
in filesize for our browser coffee-script.js lib seems too much, considering
this is something no one ever does, apart from experimentation.

In short, CoffeeScript treats any non-ASCII character as part of an identifier.
However, unicode spaces should be excluded since having blank characters as part
of a word is very confusing. This commit does so, while still keeping the
regex really simple.

It is possible to match only valid JavaScript identifiers with a really long
regex (like coco and CoffeeScriptRedux does), but CoffeeScript uses a much
simpler one, which allows a bit too much.

Quoting jashkenas#1718 #issuecomment-2152464 @jashkenas:

> But it still seems very much across the "worth it" line. You'll get the
> SyntaxError as soon as it hits JS, and performance aside -- even the increase
> in filesize for our browser coffee-script.js lib seems too much, considering
> this is something no one ever does, apart from experimentation.

In short, CoffeeScript treats any non-ASCII character as part of an identifier.
However, unicode spaces should be excluded since having blank characters as part
of a _word_ is very confusing. This commit does so, while still keeping the
regex really simple.
@jashkenas
Copy link
Owner

Sounds alright. I guess you're imagining some really bizarre copy-and-paste scenarios?

jashkenas added a commit that referenced this pull request Jan 6, 2015
@jashkenas jashkenas merged commit e769423 into jashkenas:master Jan 6, 2015
@lydell
Copy link
Collaborator Author

lydell commented Jan 6, 2015

Yes. Most importantly the “I accidentally pressed AltGr+Space which inserted a non-breaking space” scenario, though.

@lydell lydell deleted the unicode-spaces branch January 6, 2015 21:19
@michaelficarra
Copy link
Collaborator

👍

@loveencounterflow
Copy link

+1. BTW it also happened to me when testing out some Chinese character processing stuff and not switching the keyboard back to Western before editing outside a string literal; you then get U+3000 ideographic spaces with the spacebar; i'd say there quite a few whitespace characters out there you don't want to see except in a string (i'm +1 for outruling tabs as indentation, too).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants