Skip to content

Conversation

@T-Gro
Copy link
Member

@T-Gro T-Gro commented Nov 8, 2023

This addresses #16220 .

If my hypothesis is correct, the difference was caused by an end-of-file marker.
In a bigger statement (e.g. with a "let" binding), rules higher up cover for an EOF.
In a standalone parsing of just the dotlambda construct, this is not covered.

Why did matter? Because EOF is one possibility for "recover", which was meant for error handling when this feature was being implemented. But EOF does not always mean an error.

@auduchinok , @DedSec256 :
I did not like the fact of special casing EOF, because most other constructs do not need it either, it is dealt with higher up.
I would more than welcome any suggestion of yours, both for the EOF as well deduplicating the rules here.

@T-Gro T-Gro requested a review from a team as a code owner November 8, 2023 13:33
@T-Gro T-Gro requested a review from auduchinok November 8, 2023 13:33
@T-Gro T-Gro changed the title Bugfix: Standalone dotlambda as the very last statement Bugfix: Standalone dotlambda just before EOF Nov 8, 2023
Copy link
Member

@auduchinok auduchinok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@T-Gro I see two problems with the existing rules:

  • Why do the rules use different rules for expression, i.e. atomicExpr vs appExpr?
    atomicExpr covers the cases with high-precedence applications, so _.M() is already covered.
    appExpr allows _.M () to be parsed but adds an additional error.

    I think it's wrong because dotLambda is an atomic expression, according to the grammar, but it currently captures arbitrary non-atomic app expressions after it, breaking many assumptions about what atomic expressions are. It also makes f _.P 123 inconsistent with other nested applications like f g 123. On top of that it may limit possible future changes in the language and parser recovery.

  • Why does the second rule have recover at all?
    It doesn't look like there something is missing in it where we expect the recovery to kick in. This extra recover (i.e. not replacing something that is missing) is the source of the issue with EOF indeed.

It seems that simply removing the second rule can fix it (it looks too wrong and interferes with the first rule), while adding this third one only buries the problem deeper.

That will allow _.P 123 to be parsed as:

appExpr
  dotLambdaExpr
    _
    .
    namedExpr
      P
  constExpr
    123

and f _.P 123 to be parsed as:

appExpr
  appExpr
    namedExpr
      f
    dotLambdaExpr
      _
      .
      namedExpr
        P
  constExpr
    123

Currently f _.P 123 is parsed as this:

appExpr
  namedExpr
    f
  dotLambdaExpr
    _
    .
    appExpr
      namedExpr
        P
      constExpr
        123

Side question: if the second rule is removed, do we need having a special precedence at all? Having less of them always makes things easier down the road.

@T-Gro
Copy link
Member Author

T-Gro commented Nov 8, 2023

@T-Gro I see two problems with the existing rules:

  • Why do the rules use different rules for expression, i.e. atomicExpr vs appExpr?
    atomicExpr covers the cases with high-precedence applications, so _.M() is already covered.
    appExpr allows _.M () to be parsed but adds an additional error. It looks wrong to me, we should only parse atomic expressions here as a part of shorthand lambdas.
  • Why does the second rule have recover at all?
    It doesn't look like there something is missing in it where we expect the recovery to kick in. This extra recover (i.e. not replacing something that is missing) is the source of the issue with EOF indeed.

It seems that simply removing the second rule can fix it (it looks too wrong and interferes with the first rule), while adding this third one only buries the problem deeper.

That will allow _.P 123 to be parsed as:

appExpr
  dotLambdaExpr
    _
    .
    namedExpr
      P
  constExpr
    123

and f _.P 123 to be parsed as:

appExpr
  appExpr
    namedExpr
      f
    dotLambdaExpr
      _
      .
      namedExpr
        P
  constExpr
    123

Currently f _.P 123 is parsed as this:

appExpr
  namedExpr
    f
  dotLambdaExpr
    _
    .
    appExpr
      namedExpr
        P
      constExpr
        123

It seems wrong and a bit too limiting to possible future changes in the language and parser recovery. It also doesn't seem consistent with other nested applications like f g 123.

Side question: if the second rule is removed, do we need having a special precedence at all? Having less of them always makes things easier down the road.

The second rule only exists in order to give a meaningful error message instead of a generic one, while keeping some parser recovery working.
But true, the "recover" rule is not needed there at all.

@auduchinok
Copy link
Member

auduchinok commented Nov 9, 2023

The second rule only exists in order to give a meaningful error message instead of a generic one, while keeping some parser recovery working.

Could you please show an example of the generic error message when this rule is removed? I think it's very important to fix the atomic expressions being non-atomic here, and the only way I see is to remove this rule.

@T-Gro
Copy link
Member Author

T-Gro commented Nov 9, 2023

The second rule only exists in order to give a meaningful error message instead of a generic one, while keeping some parser recovery working.

Could you please show an example of the generic error message when this rule is removed? I think it's very important to fix the atomic expressions non being atomic here, and the only way I see is to remove this rule.

It is now visible in the diff + I added commented explanations.
I see three options:

  • I move the "nonatomic" error into typechecker
  • I keep 1 rule only in the parser, but implemented addition logic inside the rule to separate atomic/nonatomic
  • I revert to two rules, one passing (atomic) and one failing (nonatomic), but will avoid using the 'recover' in the rule.

The second one looks best to me, WDYT?

@auduchinok
Copy link
Member

auduchinok commented Nov 9, 2023

@T-Gro Thanks, all the examples are good. Looking at these, I think the first option is probably the best as things stand now. A simple check in the type checker could be checking that dot lambdas aren't allowed as function expressions in application expressions:

It would make this allowed, because _.P is the argument expression in the inner app expression:

let _ = f _.P 123
letBinding
  appExpr
    appExpr
      namedExpr
        f
      dotLambdaExpr
        _
        .
        namedExpr
          P
    constExpr
      123

And this would not be allowed, because _.P is the function expr:

let _ = _.P 123
letBinding
  appExpr
    dotLambdaExpr
      _
      .
      namedExpr
        P
    constExpr
      123

Can the special precedence also be removed if non-atomic expressions are no longer allowed there?

I keep 1 rule only in the parser, but implemented addition logic inside the rule to separate atomic/nonatomic

That would still mean we're parsing 'top-level' non-atomic expression inside an atomic one. I think we should avoid it, it makes the whole grammar less sound.

Copy link
Member

@auduchinok auduchinok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me now, it's good to have the underlying problem fixed!

@T-Gro
Copy link
Member Author

T-Gro commented Nov 9, 2023

Agree.

I also wanted to remove the %prec, tried that, but I started to get test failures which did not repro locally.
But it meant that _.Prop was getting parsed as an identifier and dot access, and not as a dot lambda, I wasn't sure why since I could not repro.

=> I will keep that outside of this PR, so that it can get it as it is.

== this is ready for reviews and merging it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

4 participants