Skip to content

Commit 14f0939

Browse files
committed
Change OUTDENT tokens to be positioned at the end of the previous token
This commit adds another post-processing step after normal lexing that sets the locationData on all OUTDENT tokens to be at the last character of the previous token. This does feel like a little bit of a hack. Ideally the location data would be set correctly in the first place and not in a post-processing step, but I tried that and some temporary intermediate tokens were causing problems, so I decided to set the location data once those intermediate tokens were removed. Also, having this as a separate processing step makes it more robust and isolated. This fixes the problem in decaffeinate/decaffeinate#371 . In that issue, the CoffeeScript tokens had three OUTDENT tokens in a row, and the last two overlapped with the `]`. Since at least one of those OUTDENT tokens was considered part of the function body, the function expression had an ending position just after the end of the `]`. OUTDENT tokens are sort of a weird case in the lexer anyway, since they often don't correspond to an actual location in the source code. It seems like the code in `lexer.coffee` makes an attempt at finding a good place for them, but in some cases, it has a bad result. This seems hard to avoid in the general case. For example, in this code: ```coffee [-> a] ``` There must be an OUTDENT between the `a` and the `]`, but CoffeeScript tokens have an inclusive start and end, so they must always be at least one character wide (I think). In this case, the lexer was choosing the `]` as the location, and the parser ended up generating correct location data, I believe because it ignores the outermost INDENT and OUTDENT tokens. However, with multiple OUTDENT tokens in a row, the parser ends up producing location data that is wrong. It seems to me like there isn't a solid answer to "what location do OUTDENT tokens have", since it hasn't mattered much, but for this commit, I'm defining it: they always have the location of the last character of the previous token. This should hopefully be fairly safe because tokens are still in the same order relative to each other. Also, it's worth noting that this makes the start location for OUTDENT tokens awkward. However, OUTDENT tokens are always used to mark the end of something, so their `last_line` and `last_column` values are always what matter when determining AST node bounds, so it is most important for those to be correct.
1 parent 133fadd commit 14f0939

File tree

3 files changed

+51
-0
lines changed

3 files changed

+51
-0
lines changed

lib/coffee-script/rewriter.js

+18
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

src/rewriter.coffee

+15
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@ class exports.Rewriter
3333
@tagPostfixConditionals()
3434
@addImplicitBracesAndParens()
3535
@addLocationDataToGeneratedTokens()
36+
@fixOutdentLocationData()
3637
@tokens
3738

3839
# Rewrite the token stream, looking one token ahead and behind.
@@ -368,6 +369,20 @@ class exports.Rewriter
368369
last_column: column
369370
return 1
370371

372+
# OUTDENT tokens should always be positioned at the last character of the
373+
# previous token, so that AST nodes ending in an OUTDENT token end up with a
374+
# location corresponding to the last "real" token under the node.
375+
fixOutdentLocationData: ->
376+
@scanTokens (token, i, tokens) ->
377+
return 1 unless token[0] is 'OUTDENT'
378+
prevLocationData = tokens[i - 1][2]
379+
token[2] =
380+
first_line: prevLocationData.last_line
381+
first_column: prevLocationData.last_column
382+
last_line: prevLocationData.last_line
383+
last_column: prevLocationData.last_column
384+
return 1
385+
371386
# Because our grammar is LALR(1), it can't handle some single-line
372387
# expressions that lack ending delimiters. The **Rewriter** adds the implicit
373388
# blocks, so it doesn't need to. To keep the grammar clean and tidy, trailing

test/location.coffee

+18
Original file line numberDiff line numberDiff line change
@@ -450,6 +450,24 @@ test "#3621: Multiline regex and manual `Regex` call with interpolation should
450450
eq tokenA.origin?[1], tokenB.origin?[1]
451451
eq tokenA.stringEnd, tokenB.stringEnd
452452

453+
test "Verify OUTDENT tokens are located at the end of the previous token", ->
454+
source = '''
455+
SomeArr = [ ->
456+
if something
457+
lol =
458+
count: 500
459+
]
460+
'''
461+
tokens = CoffeeScript.tokens source
462+
[..., number, curly, outdent1, outdent2, outdent3, bracket, terminator] = tokens
463+
eq number[0], 'NUMBER'
464+
for outdent in [outdent1, outdent2, outdent3]
465+
eq outdent[0], 'OUTDENT'
466+
eq outdent[2].first_line, number[2].last_line
467+
eq outdent[2].first_column, number[2].last_column
468+
eq outdent[2].last_line, number[2].last_line
469+
eq outdent[2].last_column, number[2].last_column
470+
453471
test "Verify all tokens get a location", ->
454472
doesNotThrow ->
455473
tokens = CoffeeScript.tokens testScript

0 commit comments

Comments
 (0)