Skip to content

Conversation

@auduchinok
Copy link
Member

@auduchinok auduchinok commented Jun 22, 2022

Adds the parser recovery on missing first and intermediate items in a tuple expression, e.g. when adding a new item in the front of existing ones or between them, or when replacing an existing item.

The error is placed on the comma range:

Screenshot 2022-06-22 at 17 25 13

The recovered tree is available for analysis and can be used in features like Parameter Info popup:

Screenshot 2022-06-22 at 17 13 41

The change makes the parser produce the correct tree structure for cases like (,1,1)
, (1,,1), and even cases with multiple missing items like (,1,,2,3,,4,).

@dsyme
Copy link
Contributor

dsyme commented Jun 22, 2022

Looks sort of ok.

However... I wonder if we could do this much more generically across all the parser rules.

Currently

> (, 3);;

  (, 3);;
  -^

stdin(1,2): error FS0010: Unexpected symbol ',' in interaction

I'm wondering why this doesn't at least say Unexpected symbol ',' in expression - and whether there is enough information to say Unexpected symbol ',' - expected expression.

This message is produced by this code for processing any syntax error. It has access to the parser stack at the point of failure - it then looks down this stack for something it recognises, and in this case the first thing recognised is an "interaction" that is an input to FSI. That indicates that the parser stack has been unwound more than we would like. However there may be a way to avoid that.

Doing the work to adjust the generic grammar process code would have the advantage that it might improve all syntax errors, and mean we don't have to write specific rules in the grammar for cases like this. It would also cover cases like

    ( , )  // unexpected COMMA, expected expression

     let f (x: List< , int>) =  // unexpected COMMA, expected type

I think the underlying problem with doing this generically is that we don't know which subset of unexpected syntax-error tokens (COMMA, SEMICOLON, ARROW etc) should get this treatment (keeping the stack as "declExpr" or similar), and which unexpected tokens should result in an unwind of the stack (e.g. TYPE, MODULE etc). In this case you're hard-wiring one of these tokens.

@auduchinok
Copy link
Member Author

auduchinok commented Jun 22, 2022

In addition to having the parser recovered, we're also interested in getting some node for the missing code. Even if we change the generic syntax error recovery code to produce better errors, how do we get a SynExpr, SynPat, or SynType to be used in the corresponding grammar rules? It seems we'd need to modify all such rules to include the recovered productions anyway.

Or am I misunderstanding how a more generic recovery could work here? Would it somehow produce an empty/error expression (or, say, a pattern/type) whenever an expression is expected but got a comma instead?

@dsyme
Copy link
Contributor

dsyme commented Jun 22, 2022

In addition to having parser recovered, we're also interested in getting some node for the missing code. Even if we change the generic syntax error recovery code to produce better errors, how do we get a SynExpr, SynPat, or SynType to be used in the corresponding grammar rules? It seems we'd need to modify all such cases to include the recovered case anyway.

True. I guess COMMA declExpr and COMMA and so on have a special role here as it's possible to do good repair in these cases

However looking at fantomas preview tooling it seems we're getting a good node in this case already? https://fsprojects.github.io/fantomas-tools/#/ast?data=N4KABGBEAmCmBmBLAdrAzpAXFSAacUiaAYmolmPAIYA2as%2BEkAxgPZwWQAUYuYAjAEoAOgCdhySCAC%2BQA

Which makes me wonder why we are saying "in declaration" or "in interaction" instead of "in expression"....

Note that the SyntaxError processing relies on floating the syntax error up to some construct that has error or recover recovery. It looks like it's triggering this one:

| LPAREN recover %prec prec_atomexpr_lparen_error
.

The case

( , )

seems to also give an OK AST

I'm not saying these are perfect, and it would be wonderful if our grammar didn't have any rules like

| LPAREN recover %prec prec_atomexpr_lparen_error
and we could somehow do the recovery processing more generically while still getting good repair.

@dsyme
Copy link
Contributor

dsyme commented Jun 22, 2022

For comparison:

The final pattern one looks a bit whacky, the tuple contains the right-hand-side expression

@auduchinok
Copy link
Member Author

auduchinok commented Jun 22, 2022

However looking at fantomas preview tooling it seems we're getting a good node in this case already?

The parens are lost and the first item range is wrong, though. An app expression f( , 1) has a completely different structure and produces an error about an unmatched paren (so it sounds like it has something to do with the precedence of the comma?).

With the change from this PR both ( , 1) and f( , 1) produce expected results with correct expression structures, e.g. f( , 1) parses to this:

IPrefixAppExpr
  IReferenceExpr
    FSharpIdentifierToken(type:IDENTIFIER, text:f)
  IParenExpr
    FSharpTokenType+LparenTokenElement(type:LPAREN, text:()
    Whitespace(type:WHITE_SPACE, text: ) spaces:" "
    ITupleExpr
      IFromErrorExpr
      FSharpTokenType+CommaTokenElement(type:COMMA, text:,)
      Whitespace(type:WHITE_SPACE, text: ) spaces:" "
      ILiteralExpr
        FSharpToken(type:INT32, text:1)
    FSharpTokenType+RparenTokenElement(type:RPAREN, text:))

@dsyme
Copy link
Contributor

dsyme commented Jun 22, 2022

OK, given the existence of other error recovery rules in tupleExpr I can see this makes sense to handle as a specific case

e.g.

| declExpr COMMA ends_coming_soon_or_recover

Do you think you could write out a test matrix for all combinations up to size 3 pinning down the error recovery please?

 (,)
 (,a)
 (a,)
 (,b,c)
 (a,,c)
 (a,b,)
 (,,c)
 (,b,)
 (a,,)
 (,,)

Also could you look at similar matrix for type arguments <,int> and tuple patterns?

Thanks

@auduchinok
Copy link
Member Author

Do you think you could write out a test matrix for all combinations up to size 3 pinning down the error recovery please?

Yes, I was going to look into adding new parser tests once the existing CI is green.

@dsyme
Copy link
Contributor

dsyme commented Jun 22, 2022

Looking at the pattern rules for tuples, it would be kind of nice if they were exactly symmetric in structure with the ones for expressions - they certainly aren't at the moment

There are two cases really - "simple patterns" that always have parens (e.g. used as constructore arguments) and tuple patterns e.g. let x, y = pair

| tuplePatternElements COMMA headBindingPattern

https://github.com/dotnet/fsharp/blob/main/src/Compiler/pars.fsy#L3256

@auduchinok
Copy link
Member Author

auduchinok commented Jun 23, 2022

@dsyme Thanks for suggesting to test it on tuples of size 3! It uncovered that the initial implementation didn't work properly on them and produced inner tuple expressions at recovery points. The only way I found to recognize that the parser recovered from such a situation was to check the result in the parent rule. What do you think about this approach?

@dsyme
Copy link
Contributor

dsyme commented Jun 23, 2022

I'd be happy to take this as is, but while you're in context I'm wondering if it's worth adding tests for "(1," and other incomplete tuples?

If they don't pass it's OK, we can still accept, it would just be good to get these cases pinned down if they pass.

@dsyme
Copy link
Contributor

dsyme commented Jun 23, 2022

Code needs formatting :)

@auduchinok
Copy link
Member Author

auduchinok commented Jun 23, 2022

I'd be happy to take this as is, but while you're in context I'm wondering if it's worth adding tests for "(1," and other incomplete tuples?

You mean incomplete tuples inside incomplete paren expressions specifically? The incomplete tuple cases are already present in this PR.

* add an extra line break in a line that has less than 120 chars
* add an extra space after an capitalized pattern
@auduchinok
Copy link
Member Author

@dsyme Done.

@auduchinok auduchinok changed the title Parser: recover on missing first item in tuple expression Parser: recover on missing items in tuple expression Jun 23, 2022
@vzarytovskii vzarytovskii added Area-Compiler-Syntax lexfilter, indentation and parsing Feature Improvement labels Jun 23, 2022
@vzarytovskii vzarytovskii added this to the Backlog milestone Jun 23, 2022
@dsyme
Copy link
Contributor

dsyme commented Jun 23, 2022

@auduchinok I meant the same systematic test matrix but each without the closing parens please :) May as well get it all pinned down.

@auduchinok
Copy link
Member Author

auduchinok commented Jun 24, 2022

@dsyme I think unfinished parens aren't that interesting: IDEs insert the closing parens automatically. A much more interesting thing to look into is more cases with unfinished tuples (i.e. where the last element is missing) but I'd rather work on it separately, keeping this PR smaller and focused on missing elements before commas. After working on that it'd be much more interesting to pin more parens examples.

@auduchinok
Copy link
Member Author

auduchinok commented Jun 24, 2022

OK, missing last items are also recovered now, let's see if anything fails.

@dsyme
Copy link
Contributor

dsyme commented Jun 28, 2022

@dsyme I think unfinished parens aren't that interesting: IDEs insert the closing parens automatically.

I understand, but regardless I feel we should pin down any positive recovery we have for these, so we don't regress in the future. It's common enough that people delete a )

I will merge this - if you do get a chance to pin down these cases I'd appreciate it, thanks

@dsyme dsyme merged commit a7617f9 into dotnet:main Jun 28, 2022
@dsyme
Copy link
Contributor

dsyme commented Jun 28, 2022

Great work btw!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area-Compiler-Syntax lexfilter, indentation and parsing Feature Improvement

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

3 participants