Optional chaining #137

mjambon · 2020-08-13T03:53:36Z

This adds partial (incorrect) support for optional chaining as discussed in #134.

It kind of works because I specified creative tokens ?.[ and ?.( in the grammar, and the test cases don't have whitespace after the ?.. The problem is that if I specify a proper ?. token in front of the [ of a subscript (or in front of the ( of call arguments), I obtain an ERROR node in the CST in place of the ?.. I don't know what to touch in terms of precedence or other tricks to make this work. I'll resume tomorrow.

The current implementation also allows some optional chains as left-hand side expressions, which is illegal, but I know how to fix it (requires creating separate rules optional_{member|subscript|call} as already suggested by @maxbrunsfeld in #134.

….(x).

mjambon · 2020-09-09T04:40:26Z

Hey there, @maxbrunsfeld and team. I'm back with my attempts to add optional chaining to javascript (and typescript). With some trial and error, I managed to pass my tests for property access (a?.b) and array access (a?.[b]). However I didn't manage to make the function calls work (f?.(x)) without a runtime error.

Those are all small changes and I hope we can implement the optional function calls without changing too much but I'm not sure what to try next. I had a brief look at the official typescript parser but it's huge and handwritten in somewhat procedural style. My next best thought is to find another javascript parser that may be written in a more declarative style, and try to imitate what they. I'll take any suggestions.

mjambon · 2020-09-11T00:06:52Z

My colleague @nbrahms added support for f?.(x) using a single token for ?.(. We hope it covers most cases in practice. We'll start using this internally and will report if we see any problem. Optional chaining is widely used in typescript (about 10% of the files we looked at), so these changes should really help.

maxbrunsfeld

This is looking good; thank you both for fixing this! I suggested a couple of small changes.

One slightly larger change I'd like to discuss: in member_expression and subscript_expression, the optional operator ?. is just added to the existing rule, instead of adding a new named rule (like optional_member_expression etc). But in the case of call_expression, you've added a separate named opt_arguments node.

If possible (and if it works for your use case), I'd like to handle it the same way across all three cases (member, subscript, and call).

What do you think about removing opt_arguments and just adding ?. as an optional token inside call expression? Or conversely, adding separate optional_* variations of subscript and member expression? I think I would prefer the first option (and it will result in a smaller binary size), but I am open to the second as well.

maxbrunsfeld · 2020-09-14T19:35:10Z

grammar.js

    )),

+    opt_arguments: $ => prec(PREC.CALL, seq(
+      '?.(',


Can you split the ?. into a separate token? I think whitespace is most likely allowed between ?. and (.

I think @mjambon mentioned that but this led to parsing errors when applying it on a concrete file.

Exactly right. Using seq('?.', '(', ...) did not parse.

What specifically was failing to parse? I'm trying this locally, with the following added unit test, and everything seems to parse ok:

============================================ Optional function calls ============================================ a[b]?.(c); d.e?.(f); --- (program (expression_statement (call_expression (subscript_expression (identifier) (identifier)) (opt_arguments (identifier)))) (expression_statement (call_expression (member_expression (identifier) (property_identifier)) (opt_arguments (identifier)))))

This is my grammar diff:

--- a/grammar.js +++ b/grammar.js @@ -685,7 +685,8 @@ module.exports = grammar({ $.super, alias($._reserved_identifier, $.identifier) )), - choice('[', seq('?.', '[')), + optional('?.'), + '[', field('index', $._expressions), ']' )), @@ -952,7 +953,8 @@ module.exports = grammar({ )), opt_arguments: $ => prec(PREC.CALL, seq( - '?.(', + '?.', + '(', commaSep(optional(choice($._expression, $.spread_element))), ')' )),

Hi @maxbrunsfeld

Using your grammar.js diff, and a test file of:

x?.(y); x ?. (y); x?.y; x?.[0]; new function bob (a, b) {} ?.(1, 2);

I got:

➜ tree-sitter-javascript git:(optional-chains) ✗ npx tree-sitter parse opt.js (program [0, 0] - [9, 0] (expression_statement [0, 0] - [0, 7] (call_expression [0, 0] - [0, 6] function: (identifier [0, 0] - [0, 1]) (ERROR [0, 1] - [0, 3]) arguments: (arguments [0, 3] - [0, 6] (identifier [0, 4] - [0, 5])))) (expression_statement [2, 0] - [2, 9] (call_expression [2, 0] - [2, 8] function: (identifier [2, 0] - [2, 1]) (ERROR [2, 2] - [2, 4]) arguments: (arguments [2, 5] - [2, 8] (identifier [2, 6] - [2, 7])))) (expression_statement [4, 0] - [4, 5] (member_expression [4, 0] - [4, 4] object: (identifier [4, 0] - [4, 1]) property: (property_identifier [4, 3] - [4, 4]))) (expression_statement [6, 0] - [6, 7] (subscript_expression [6, 0] - [6, 6] object: (identifier [6, 0] - [6, 1]) index: (number [6, 4] - [6, 5]))) (expression_statement [8, 0] - [8, 36] (call_expression [8, 0] - [8, 35] function: (new_expression [8, 0] - [8, 26] constructor: (function [8, 4] - [8, 26] name: (identifier [8, 13] - [8, 16]) parameters: (formal_parameters [8, 17] - [8, 23] (identifier [8, 18] - [8, 19]) (identifier [8, 21] - [8, 22])) body: (statement_block [8, 24] - [8, 26]))) arguments: (opt_arguments [8, 27] - [8, 35] (number [8, 30] - [8, 31]) (number [8, 33] - [8, 34]))))) opt.js 0 ms (ERROR [0, 1] - [0, 3])```

Note that tree-sitter seems to be preferring the $.arguments rule instead of $.opt_arguments here. I played with various prec settings, but that didn't help me out any.

@maxbrunsfeld now if you run the tests, you should see an ERROR node where the ?. is:

1 failure: expected / actual 1. Optional function calls: (program (expression_statement (call_expression (identifier) (ERROR) (arguments (identifier)))) (expression_statement (call_expression (subscript_expression (identifier) (identifier)) (opt_arguments (identifier)))) (expression_statement (call_expression (member_expression (identifier) (property_identifier)) (opt_arguments (identifier)))))

Looks like @mjambon beat me to it, but you can reference semgrep@7633d91 for my test case

(thanks for your help here @maxbrunsfeld , definitely new to tree-sitter here and still learning the ropes!)

maxbrunsfeld · 2020-09-14T19:36:20Z

grammar.js

+        $.super,
+        alias($._reserved_identifier, $.identifier)
+      )),
+      choice('[', seq('?.', '[')),


I think it would be slightly clearer to write this is as:

optional('?.'), '[',

instead of

choice('[', seq('?.', '[')),

for optional calls.

maxbrunsfeld · 2020-09-14T23:33:23Z

Yeah, this ended up requiring a bit larger of a change than I expected.

The problem was some complexity that we added in #89, to deal with the inherent ambiguity between new_expression and call_expression when parsing things like new foo.bar(). I switched to a different solution for that ambiguity, which makes it much easier to add the ?. token.

My change (a PR targeted at this branch) is at semgrep#1.

Split `?.` and `(` into separate tokens by removing `new_expression` complexity

maxbrunsfeld · 2020-09-15T02:08:33Z

Thanks for your work on this @mjambon and @nbrahms!

aryx · 2020-09-15T06:59:45Z

Fantastic. Thx a lot Max. Is there any chance we can get a glimpse of your debugging process to fix this issue. How did you get the intuition the problem was related to #89 and why all those what-seems-unrelated changes.
I know there's an art in grammar engineering, just curious if you have any tips for us so we can fix those bugs by ourselves.

maxbrunsfeld · 2020-09-15T17:45:18Z

Yes, absolutely. This is a good example to share, because it was a bit tricky for me to get it working. I've pushed a branch to illustrate my thought process step by step: https://github.com/tree-sitter/tree-sitter-javascript/commits/optional-chain-debugging-process.

In each commit, I have a message that describes the problem. I'll reiterate some of this explanation here, discussing each commit in sequence. For some commits, I'll include screenshots of the "debug graphs" that tree-sitter parse and tree-sitter test can produce if you pass the -D flag (requires graphviz to be installed).

9ba9b40

Here, I've tried to remove the opt_arguments rule, and just add an optional ?. token to call_expression. This generates the parser without error, but as @nbrahms pointed out, the following code does not parse correctly:
```
foo ?. ()
```
Here is a screenshot of the debug graph

The problem is that, in a call_expression, the leading identifier node would need to be wrapped in an _expression, because of the structure of the call_expression rule. But the parser is not creating an _expression node here, because the member_expression and subscript_expression rules allow a plain identifier, and they have a non-zero precedence (unlike _expression). So the parser has chosen to consume the ?. token without wrapping foo in an expression, in order to allow parsing the higher-precedence member_expression/subscript_expression rules, sacrificing the ability to parse a call_expression containing a ?..
4909c4f

This problem ☝️ lead me to try to simplify member_expression and subscript_expression to always parse their left hand side as an _expression. This is easier to understand, and optional chaining now parses correctly across the board 🎉 .

The problem now is that this code now parses incorrectly:
```
new Foo.Bar();
```
Desired tree:
```
(program [0, 0] - [1, 0]
  (expression_statement [0, 0] - [0, 13]
    (new_expression [0, 0] - [0, 13]
      constructor: (member_expression [0, 4] - [0, 11]
        object: (identifier [0, 4] - [0, 7])
        property: (property_identifier [0, 8] - [0, 11]))
      arguments: (arguments [0, 11] - [0, 13]))))
```
Actual tree:
```
(program [0, 0] - [1, 0]
  (expression_statement [0, 0] - [0, 13]
    (call_expression [0, 0] - [0, 13]
      function: (member_expression [0, 0] - [0, 11]
        object: (new_expression [0, 0] - [0, 7]
          constructor: (identifier [0, 4] - [0, 7]))
        property: (property_identifier [0, 8] - [0, 11]))
      arguments: (arguments [0, 11] - [0, 13]))))
```
Here is the relevant part of the debug graph:

The parser makes a wrong decision when it sees the . token: it creates a new_expression for new Foo. Two notes about this decision:
1. This decision is allowed because new_expression still allows a plain identifier after the new; it is not expected to be wrapped in an expression. This is something we changed in Remove calls from new expressions #89.
2. The parser chooses this decision because 'new_expression' has a non-zero precedence, whereas _expression itself has zero precedence (because zero is the default). So when the parser sees the '.' token in 'new Foo.Bar()', it needs to decide whether the 'Foo' is an 'expression' or part of a 'new_expression'. Because of the non-zero precedence on 'new_expression', it chooses the latter (which is wrong).

15d5e85

Next, I made the same change to new_expression: just parse the constructor part as an _expression, instead of allowing a select subset of the possible children of _expression. When I made this change, I get a failure in tree-sitter generate:

Unresolved conflict for symbol sequence:

  'new'  _expression  •  '?.'  …

Possible interpretations:

  1:  'new'  (call_expression  _expression  •  '?.'  arguments)                    (precedence: 12)
  2:  'new'  (call_expression  _expression  •  '?.'  template_string)              (precedence: 12)
  3:  'new'  (member_expression  _expression  •  '?.'  identifier)                 (precedence: 14)
  4:  'new'  (subscript_expression  _expression  •  '?.'  '['  _expressions  ']')  (precedence: 14, associativity: Right)
  5:  (new_expression  'new'  _expression)  •  '?.'  …                             (precedence: 13, associativity: Right)

Possible resolutions:

  1:  Specify a higher precedence in `call_expression` and `call_expression` and `member_expression` and `subscript_expression` than in the other rules.
  2:  Specify a higher precedence in `new_expression` than in the other rules.
  3:  Add a conflict for these rules: `call_expression`, `new_expression`, `member_expression`, `subscript_expression`

Here is how I read this conflict (I also wrote this in the commit message):

One one hand, the ?. token may be part of a member_expression or subscript_expression. In that case, we want to SHIFT the ?. token before creating the new_expression node, in order to group things
like this:

'new' ( expression '?.' identifier)
      |                           |
      ----- member_expression ----

On the other hand, the ?. token may be part of a call_expression, in which case, the parser thinks we would want to create the new_expression first, so as to group things like this:

( 'new' expression ) '?.' arguments
|                  |
-- new_expression --

This second conflicting interpretation arises because new_expression currently has higher precedence than call_expression - we set the precedences this way in PR Remove calls from new expressions #89.

1d05f78

Next I avoided this conflict by reducing the precedence of new_expression to match call_expression. This eliminates the conflicting interpretation (5) in the conflict message above, because there is no longer any reason to wrap group new foo in a new_expression before consuming the ?. token.

But this reveals a new conflict:
```
 Unresolved conflict for symbol sequence:

   'new'  _expression  •  '('  …

 Possible interpretations:

   1:  'new'  (call_expression  _expression  •  arguments)  (precedence: 12)
   2:  (new_expression  'new'  _expression  •  arguments)   (precedence: 12)
   3:  (new_expression  'new'  _expression)  •  '('  …      (precedence: 12)

 Possible resolutions:

   1:  Specify a higher precedence in `call_expression` and `new_expression` than in the other rules.
   2:  Specify a higher precedence in `new_expression` than in the other rules.
   3:  Specify a left or right associativity in `new_expression`
   4:  Add a conflict for these rules: `call_expression`, `new_expression`
```
To me, this conflict captures the fundamental problem that originally motivated us to add all of the complexity that we have encountered so far. It arises from the fact that new_expression has an optional arguments node. Intuitively, the question is, when parsing this code:
```
new Foo()
```
is it parsed
1. as (new_expression (identifier) (arguments))
2. or as (call_expression (new_expression (identifier)) (arguments))
We want it to choose the first interpretation.
e877b75

Here I decided that, rather than trying to encode ☝️ this logic using precedence (which is more efficient at runtime in some ways), I will handle this ambiguity using the GLR mechanism of trying both interpretations. I opted into this behavior explicitly by adding this entry to conflicts (as suggested by the conflict message above):
```
[$.new_expression, $.call_expression]
```
Then, to make sure that the parser would pick interpretation 1 above, I added a dynamic precedence to the arguments node inside of new_expression. This expresses the idea that the parser should prefer a tree that has this structure.

nbrahms · 2020-09-15T19:31:39Z

@maxbrunsfeld This is awesome! Thank you for providing so much detail.

mjambon · 2020-09-15T21:37:49Z

Thank you @maxbrunsfeld, this is the best GitHub comment I have ever seen! It believe it will boost our abilities a lot.

javascript. See tree-sitter/tree-sitter-javascript#137 (comment) for full details. Some conflicts remain and should be solved in a later commit, since they look different than the issue we had with javascript.

mjambon changed the title ~~WIP: Optional chaining~~ Optional chaining Aug 18, 2020

Add partial support for optional chaining: a?.b and a?.[b] but not f?…

d0bbf64

….(x).

mjambon force-pushed the optional-chains branch from 87cc26f to d0bbf64 Compare September 9, 2020 04:28

Update generated files

d79ba77

Martin Jambon and others added 3 commits September 8, 2020 21:41

Undo experimental change

b145338

Add safe dereference for function calls

dd096f4

fixup! Add safe dereference for function calls

a66eb73

maxbrunsfeld requested changes Sep 14, 2020

View reviewed changes

Martin Jambon and others added 2 commits September 14, 2020 15:09

Add failing test case for optional chaining using a proper '?.' token

dd7c216

for optional calls.

Simplify parsing of new_expression, avoid ?.( token

2068fdc

Merge pull request #1 from tree-sitter/optional-chains

34c4b83

Split `?.` and `(` into separate tokens by removing `new_expression` complexity

maxbrunsfeld merged commit 3032cf5 into tree-sitter:master Sep 15, 2020

mjambon deleted the optional-chains branch September 16, 2020 23:45

mjambon mentioned this pull request Sep 18, 2020

WIP: update for latest tree-sitter-javascript tree-sitter/tree-sitter-typescript#84

Closed

maxbrunsfeld mentioned this pull request Sep 22, 2020

New approach to handling optional chains #138

Merged

Uh oh!

Optional chaining #137

Optional chaining #137

Uh oh!

Conversation

mjambon commented Aug 13, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mjambon commented Sep 9, 2020

Uh oh!

mjambon commented Sep 11, 2020

Uh oh!

maxbrunsfeld left a comment

Choose a reason for hiding this comment

Uh oh!

maxbrunsfeld Sep 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maxbrunsfeld Sep 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maxbrunsfeld commented Sep 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maxbrunsfeld commented Sep 15, 2020

Uh oh!

aryx commented Sep 15, 2020

Uh oh!

maxbrunsfeld commented Sep 15, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nbrahms commented Sep 15, 2020

Uh oh!

mjambon commented Sep 15, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mjambon commented Aug 13, 2020 •

edited

Loading

maxbrunsfeld Sep 14, 2020 •

edited

Loading

maxbrunsfeld Sep 14, 2020 •

edited

Loading

maxbrunsfeld commented Sep 14, 2020 •

edited

Loading

maxbrunsfeld commented Sep 15, 2020 •

edited

Loading