Skip to content

Commit 9ec427b

Browse files
committed
Fix jashkenas#2516, jashkenas#3560: Unicode space handling
It is possible to match only valid JavaScript identifiers with a really long regex (like coco and CoffeeScriptRedux does), but CoffeeScript uses a much simpler one, which allows a bit too much. Quoting jashkenas#1718 #issuecomment-2152464 @jashkenas: > But it still seems very much across the "worth it" line. You'll get the > SyntaxError as soon as it hits JS, and performance aside -- even the increase > in filesize for our browser coffee-script.js lib seems too much, considering > this is something no one ever does, apart from experimentation. In short, CoffeeScript treats any non-ASCII character as part of an identifier. However, unicode spaces should be excluded since having blank characters as part of a _word_ is very confusing. This commit does so, while still keeping the regex really simple.
1 parent b70f657 commit 9ec427b

File tree

3 files changed

+29
-2
lines changed

3 files changed

+29
-2
lines changed

lib/coffee-script/lexer.js

+1-1
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

src/lexer.coffee

+2-1
Original file line numberDiff line numberDiff line change
@@ -731,7 +731,8 @@ BOM = 65279
731731

732732
# Token matching regexes.
733733
IDENTIFIER = /// ^
734-
( [$A-Za-z_\x7f-\uffff][$\w\x7f-\uffff]* )
734+
(?!\d)
735+
( (?: (?!\s)[$\w\x7f-\uffff] )+ )
735736
( [^\n\S]* : (?!:) )? # Is this a property name?
736737
///
737738

test/compilation.coffee

+26
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,32 @@ test "Issue #986: Unicode identifiers", ->
5252
λ = 5
5353
eq λ, 5
5454

55+
test "#2516: Unicode spaces should not be part of identifiers", ->
56+
a = (x) -> x * 2
57+
b = 3
58+
eq 6, a b # U+00A0 NO-BREAK SPACE
59+
eq 6, a b # U+1680 OGHAM SPACE MARK
60+
eq 6, a b # U+2000 EN QUAD
61+
eq 6, a b # U+2001 EM QUAD
62+
eq 6, a b # U+2002 EN SPACE
63+
eq 6, a b # U+2003 EM SPACE
64+
eq 6, a b # U+2004 THREE-PER-EM SPACE
65+
eq 6, a b # U+2005 FOUR-PER-EM SPACE
66+
eq 6, a b # U+2006 SIX-PER-EM SPACE
67+
eq 6, a b # U+2007 FIGURE SPACE
68+
eq 6, a b # U+2008 PUNCTUATION SPACE
69+
eq 6, a b # U+2009 THIN SPACE
70+
eq 6, a b # U+200A HAIR SPACE
71+
eq 6, a b # U+202F NARROW NO-BREAK SPACE
72+
eq 6, a b # U+205F MEDIUM MATHEMATICAL SPACE
73+
eq 6, a b # U+3000 IDEOGRAPHIC SPACE
74+
75+
# #3560: Non-breaking space (U+00A0) (before `'c'`)
76+
eq 5, {c: 5}[ 'c' ]
77+
78+
# A line where every space in non-breaking
79+
  eq 1 + 12  
80+
5581
test "don't accidentally stringify keywords", ->
5682
ok (-> this == 'this')() is false
5783

0 commit comments

Comments
 (0)