fix: change OUTDENT tokens to be positioned at the end of the previous token #4

alangpierce · 2016-08-11T05:42:40Z

This commit adds another post-processing step after normal lexing that sets the
locationData on all OUTDENT tokens to be at the last character of the previous
token. This does feel like a little bit of a hack. Ideally the location data
would be set correctly in the first place and not in a post-processing step, but
I tried that and some temporary intermediate tokens were causing problems, so I
decided to set the location data once those intermediate tokens were removed.
Also, having this as a separate processing step makes it more robust and
isolated.

This fixes the problem in decaffeinate/decaffeinate#371 .
In that issue, the CoffeeScript tokens had three OUTDENT tokens in a row, and
the last two overlapped with the ]. Since at least one of those OUTDENT tokens
was considered part of the function body, the function expression had an ending
position just after the end of the ].

OUTDENT tokens are sort of a weird case in the lexer anyway, since they often
don't correspond to an actual location in the source code. It seems like the
code in lexer.coffee makes an attempt at finding a good place for them, but in
some cases, it has a bad result. This seems hard to avoid in the general case.
For example, in this code:

[->
  a]

There must be an OUTDENT between the a and the ], but CoffeeScript tokens
have an inclusive start and end, so they must always be at least one character
wide (I think). In this case, the lexer was choosing the ] as the location,
and the parser ended up generating correct location data, I believe because
it ignores the outermost INDENT and OUTDENT tokens. However, with multiple
OUTDENT tokens in a row, the parser ends up producing location data that is
wrong.

It seems to me like there isn't a solid answer to "what location do OUTDENT
tokens have", since it hasn't mattered much, but for this commit, I'm defining
it: they always have the location of the last character of the previous token.
This should hopefully be fairly safe because tokens are still in the same order
relative to each other. Also, it's worth noting that this makes the start
location for OUTDENT tokens awkward. However, OUTDENT tokens are always used to
mark the end of something, so their last_line and last_column values are
always what matter when determining AST node bounds, so it is most important for
those to be correct.

…s token This commit adds another post-processing step after normal lexing that sets the locationData on all OUTDENT tokens to be at the last character of the previous token. This does feel like a little bit of a hack. Ideally the location data would be set correctly in the first place and not in a post-processing step, but I tried that and some temporary intermediate tokens were causing problems, so I decided to set the location data once those intermediate tokens were removed. Also, having this as a separate processing step makes it more robust and isolated. This fixes the problem in decaffeinate/decaffeinate#371 . In that issue, the CoffeeScript tokens had three OUTDENT tokens in a row, and the last two overlapped with the `]`. Since at least one of those OUTDENT tokens was considered part of the function body, the function expression had an ending position just after the end of the `]`. OUTDENT tokens are sort of a weird case in the lexer anyway, since they often don't correspond to an actual location in the source code. It seems like the code in `lexer.coffee` makes an attempt at finding a good place for them, but in some cases, it has a bad result. This seems hard to avoid in the general case. For example, in this code: ```coffee [-> a] ``` There must be an OUTDENT between the `a` and the `]`, but CoffeeScript tokens have an inclusive start and end, so they must always be at least one character wide (I think). In this case, the lexer was choosing the `]` as the location, and the parser ended up generating correct location data, I believe because it ignores the outermost INDENT and OUTDENT tokens. However, with multiple OUTDENT tokens in a row, the parser ends up producing location data that is wrong. It seems to me like there isn't a solid answer to "what location do OUTDENT tokens have", since it hasn't mattered much, but for this commit, I'm defining it: they always have the location of the last character of the previous token. This should hopefully be fairly safe because tokens are still in the same order relative to each other. Also, it's worth noting that this makes the start location for OUTDENT tokens awkward. However, OUTDENT tokens are always used to mark the end of something, so their `last_line` and `last_column` values are always what matter when determining AST node bounds, so it is most important for those to be correct.

eventualbuddha · 2016-08-11T13:32:55Z

Released in v1.10.0-patch2.

eventualbuddha merged commit dcf3310 into decaffeinate:decaffeinate-fork-1.10.0 Aug 11, 2016

alangpierce mentioned this pull request Aug 14, 2016

Emits ending-square-bracket in the wrong place decaffeinate/decaffeinate#371

Closed

alangpierce mentioned this pull request Aug 23, 2016

Change OUTDENT tokens to be positioned at the end of the previous token jashkenas/coffeescript#4296

Merged

This was referenced Oct 3, 2016

"unmatched }" error on some deeply-nested expressions decaffeinate/decaffeinate#446

Closed

Fix location data for implicit CALL_END tokens #6

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: change OUTDENT tokens to be positioned at the end of the previous token #4

fix: change OUTDENT tokens to be positioned at the end of the previous token #4

Uh oh!

alangpierce commented Aug 11, 2016

Uh oh!

eventualbuddha commented Aug 11, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix: change OUTDENT tokens to be positioned at the end of the previous token #4

fix: change OUTDENT tokens to be positioned at the end of the previous token #4

Uh oh!

Conversation

alangpierce commented Aug 11, 2016

Uh oh!

eventualbuddha commented Aug 11, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants