-
Notifications
You must be signed in to change notification settings - Fork 3.7k
FSharp fixes #3357
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FSharp fixes #3357
Conversation
Since the name are not reserved and can be used for variable/binding names, it can be confusing and misleading.
* Type names can contain apostrophes. * Type symbols can contain quoted identifiers, but no apostrophes.
* We now highlight operators. * Type names are no longer highlighted globally. * I added new tests (operator examples in operators.txt, identifier edge cases in types.txt)
src/languages/fsharp.js
Outdated
| scope: 'operator', | ||
| // Try to match all valid combinations | ||
| match: regex.either( | ||
| /<?@@?>?/, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're not doing this in an array (is there a good reason that can't work?) please add comments above each line breaking these out for readability. This is usually much harder to maintain than a simple list/array though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could decompose most of the patterns in individual operators (is that what you meant?), but it would probably take a lot more space.
I'll try another approach...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Readability trumps space. This all gets gzipped for most people anyways. At some point maybe one could argue speed (of the more complex regex)... but long-term it'd still be nicer to have a "pretty" readable array and then some brains to compile that into "smart" regex (such tools exist I think)... rather than having more difficult to read regex and no pretty list.
I was offering you the choice though. 😀 If we leave them compacted I'd like to see what they are intending to match written above them as a rough "guide".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I simplified operators matching, though I had to add a CHAINED_DOT_GUARD that feels a bit hacky. Let me know what you think...
| begin: /^\s*\[</, | ||
| excludeBegin: true, | ||
| end: regex.lookahead(/>\]/), | ||
| end: />\]/, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you've expanded this to include the <> inside the scope now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, because now that I added operators, I didn't want the [< and >] around an attribute definition to be highlighted as operators.
Actually I had to change this because as it was before it was not symmetric: IIRC the begin part was not assigned any scope, but the end was matched as an operator.
I had to choose between matching everything as meta, matching begin and end as operators, or excluding both the begin and the end.
Matching everything as meta seems like a good compromise, and it can be changed later on if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to choose between...
No... you could also have used a multi-matcher to match the <, content, > separately while only applying the scope to the content, if that would have been preferable. As long as the rule consumes it another rule (operator) won't match it.. ie:
match: [ /</ , ... , />/ ],
scope: { 2: "meta" }You get the idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure!
The only issue with a multi-matcher is that I don't see how to use both a multi-matcher and contains for nested modes.
Honestly I don't know what's preferable between highlighting the brackets or not. Both seem fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only issue with a multi-matcher is that I don't see how to use both a multi-matcher and contains for nested modes.
Can you give a specific example you're having difficulty with? Both begin and end can individually be multi-matchers with contains in-between.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's say I wanted to define a beginning ([<), an end (>]), and declare that everything in between that's not in contains has a meta scope.
Would a multi-matcher be appropriate? Matching the attribute name is a bit tricky because the syntax of an attribute can be relatively complex.
Here are some examples:
[<Attribute>] // simple attribute
[<Attribute("attribute with a string value and an int", 42)>]
// But they can also contain a target:
let function1 x : [<return: MyCustomAttributeThatWorksOnReturns>] int = x + 1
// And quoted identifiers!
// (which can contain almost anything and should very much be added to `contains`
// to avoid breaking everything if it looks something like ``my evil <] attribute name``)
[<``module``: MyCustomAttributeThatWorksOnModules>]I found that simply applying a meta scope on the whole mode is the easiest, but I could be missing something.
If we don't want to highlight the begin and the end, it's then easy to just add excludeBegin: true and excludeEnd: true (a look-ahead on end would allow other modes, namely the OPERATOR one, to highlight it wrongly, and I couldn't find a way to use beginScope or endScope to reset the scope just for begin and end)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may still be lost my point was only that you're fully in charge of scopes.. if you wanted to exclude the boundaries [< and >] you could do that either with exclude* or with multi-match... but if you wanted JUST to say highlight them multi-match is good at that.
And nothing about multi-match precludes using contains... contains has a chance to run immediately after begin/match. You'd think it requires end, but every mode has an end it's just "match anything" for modes with no explicit end... but that still allows for contains rules to trigger in between begin and "match anything".
Though in this case we do have an obvious beginning and end, so if you were using contains you'd probably use the obvious begin/end combo.
Let's say I wanted to define a beginning ([<), an end (>]), and declare that everything in between that's not in contains has a meta scope.
You can't do this though... scopes are over-arching - they don't turn on and off when they find rules in the middle. So either the WHOLE construct would be meta or you'd have to individually pick things to be meta with sub-rules inside contains.
(a look-ahead on end would allow other modes, namely the OPERATOR one, to highlight it wrongly
Right I wasn't suggesting this. We use look-ahead on end when we WANT that behavior (to not consume)... when we do want to consume then a normal end is perfect. (which can be excluded or part of a non-scoped multi-match)
If you don't have this "perfect" to your satisfaction yet perhaps a small code example with the desired perfect HTML would show me what you're trying to do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may still be lost my point was only that you're fully in charge of scopes.
Ok, I got that^^
If you don't have this "perfect" to your satisfaction yet perhaps a small code example with the desired perfect HTML would show me what you're trying to do?
Thank you for proposing your help. I generally know how to obtain the results I want (in the worst case by trial and error). It's just that I'm never completely sure if I'm doing it by the book. (With you suggesting a multi-matcher, I was wondering if I misunderstood something)
Anyway, honestly I'm not even certain how it should be highlighted 😅
I think highlighting the whole thing — boundaries included — is best for now, so if that's ok with you I will stay with something like the following:
{
// e.g. [<Attributes("")>] or [<``module``: MyCustomAttributeThatWorksOnModules>]
scope: 'meta',
begin: /\[</,
end: />\]/,
contains: [
QUOTED_IDENTIFIER,
// can contain any constant value
TRIPLE_QUOTED_STRING,
VERBATIM_STRING,
QUOTED_STRING,
CHAR_LITERAL,
NUMBER
]
}(I find the biggest difficulty in all this is to refrain myself from adding unrelated adjustments each time I'm seeing a new case that's not highlighted correctly. I know I have to stop somewhere if I want you to review it, but it's very difficult!!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think highlighting the whole thing — boundaries included — is best for now,
We can try it and see how it works in practice... seems like a reasonable start even if later the meta needs to be toned down a bit.
I find the biggest difficulty in all this is to refrain myself from adding unrelated adjustments each time I'm seeing a new case that's not highlighted correctly. I
Yeah... but good news is you can always open a new PR after. And if you manage your commits properly you can just keep working and then pull them into a new PR after - though that's harder with lots of dependent changes.
Is this common though? Losing types feels like a real loss. |
I think naming a variable or a function parameter Another problem is that the hard-coded list of primitive types we had previously is a bit arbitrary anyway: why should The more correct way to do this would be to highlight type annotations from the context, but it turns out it's more difficult to do properly than I initially thought... I'd rather remove the fixed list of type names now, and maybe somebody can implement type annotation highlighting from context later. In the meantime, I'm not sure it's actually that big of a loss: F# uses type inference extensively to assign a type to things, and type annotations are added only when the compiler cannot deduce a type based on context, which is not that often. See the tour of F# as an example, and look for type annotations (look for what's generally after a |
We probably match some invalid combinations, but we also match custom operators better (in one span instead of several)
|
Well, I thought I would give highlighting type annotations another try... It seems to work well for the cases I tested, but the grammar is a bit complicated despite my efforts. There is at least one documented bug for a syntax I didn't know was possible... |
src/languages/fsharp.js
Outdated
| const TYPE_ANNOTATION = { | ||
| scope: 'type', | ||
| begin: regex.concat( // a type annotation is a | ||
| /:/, // colon |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does F sharp not using : for ternary or object attributes, etc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no ternary in F#, and I'm not aware of any usage of : other than to create a type annotation.
The symbol and operator reference seems to confirm this.
BTW, I found the F# grammar used by the main F# VsCode extension: https://github.com/ionide/ionide-fsgrammar/blob/master/grammars/fsharp.json
I'm not sure how useful it is for highlight.js, but this is certainly a good resource to look at...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no ternary in F#, and I'm not aware of any usage of : other than to create a type annotation.
If true that's a stroke of luck... :)
src/languages/fsharp.js
Outdated
| const OPERATOR = { | ||
| scope: 'operator', | ||
| // Try to match all valid combinations, maybe a bit more (but not that much since custom operators are possible)... | ||
| match: /[<>@~&^+=:%*!$\/\?\|\-\.]+/, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the opposite of what we discussed. We want the entire list broken out so we can easily read it and update it in the future - and prevent false positives, etc. Even the first iteration of OPERATOR was better (had it been commented to list the operators each regex was catching)...
Even if there is a "catch-all" for custom operators that should be separate from catching the built-in operators. Eventually we may need to manage the relevance on these things and having them broken out is a huge boon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I naively thought matching any combination of this list of operator characters would be a welcomed simplification.
I'll go back to an expanded list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps welcomed until we have to debug/fix/improve it one day or try to adjust the relevance of individual operators in the future. :-) We'd probably have different policies if we were dealing with a single perfect grammar rather than 180 that have to happily co-exist with each other in complex ways. :)
src/languages/fsharp.js
Outdated
|
|
||
| const CHAINED_DOT_GUARD = { | ||
| // Prevent the dot in Name.Name patterns from being matched as an operator. | ||
| match: /(\w|')\.(\w)/, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this match the whole expression though? This will break for something like:
Name.nkeyword
Where this would guard the e.n but then keyword would be highlighted as a keyword. No?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes you are absolutely right, I spotted this only after committing.
This whole mode is a hack that I hope I will be able to remove once I rewrite operator matching.
And follow the official grammar more closely
In favor of supporting some annotations that cannot be matched otherwise.
- Add the `get` built_in so it's symmetric with `set` - Fix matching the end of tupled types - Set type annotation relevance to 0 to fix language auto-detection - Fix an edge case for type definition matches The biggest changes are for attributes: - They don't have to be at the beginning of a line anymore - They can contain multiple sub-modes
|
Summary of the latest commits:
I left a TODO with several alternative ways to declare the range of operator characters. I'm not sure which one you prefer so let me know. The syntax currently in use has the advantage of properly escaping all the regex characters (I wasted an hour because I hadn't escaped a const OPERATOR_CHARS = Array.from("!%&*+-/<=>@^|~?");
const OPERATOR_CHAR_RE = regex.concat('[', ...OPERATOR_CHARS.map(regex.escape), ']');
// TODO: choose one of the alternative definitions:
// const OPERATOR_CHAR_RE = regex.either(...OPERATOR_CHARS.map(regex.escape));
// const OPERATOR_CHAR_RE = /[!%&*+\-\/<=>@^|~?]/; |
src/languages/fsharp.js
Outdated
| const PARENS_TYPE_ANNOTATION = { | ||
| begin: /\(/, | ||
| end: /\)/, | ||
| relevance: 0 | ||
| }; | ||
| const ANONYMOUS_RECORD_TYPE_ANNOTATION = { | ||
| begin: /{\|/, | ||
| end: /\|}/ | ||
| }; | ||
| const TYPE_ANNOTATION = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are these two rules accomplishing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They prevent the TYPE_ANNOTATION mode from ending when a ) or |} that is part of a nested type annotation is found.
e.g.
// The type of myBinding is an anonymous record,
// whose X label is also an anonymous record:
let myBinding: {| X: {| Nested: bool |}; Y: string |} = ()
// f is a function returning a bool whose
// first argument is a function taking a string an returning an int:
let f: (string -> int) -> bool = ()These modes are mutually recursive, so their contains are defined further down.
|
Is |
|
I've taken a much closer look at the type stuff now. It's going to need some changes. In general (in the core library) we do not support "run-on" or large block scopes.
Now it's possible you could still use IE:
We [mostly] only care about the individual keywords. We don't care about the larger syntactic "type definition". I'm sure we break this rule in other grammars, but it's usually either historic or out of laziness and it's something we should fix there as well, not encourage further. |
I would argue that a type annotation is not a type definition and it's usually much shorter, but I understand your point, and it's true annotations can get big and have a relatively complex syntax as you may have seen in the examples.. Can you confirm the issue is not the size of the grammar, but the output with I decided to highlight almost the whole annotation the same way because it was easier, but I guess I could be more picky and only highlight the actual type names. The grammar would probably be a bit bigger. Would that be ok? |
Forgive me, mixing up languages or terminology, but I think you got the point.
Well, it's not either/or exactly... though typically complexity is what I care about more than size. And when things start to smell too much like "parsers"... we try to be largely a simple pattern matcher the core grammars, we don't try to be a contextual grammar parser for individual languages. Right now we have a single nested context ("type annotation") with clear & clearly commented start and end conditions, so that seems "ok" I think... though as soon as F# adds ternary or other uses of
How so, other than adding the list of type name back? |
I really wanted the hard coded type list to go. Without any context it's almost meaningless, and if we keep the
Ok. I was trying to keep it simple too, but there are always edge cases that don't work unless we add context. I'll see if I can find a solution without adding more complexity, but maybe only highlighting some known type names when they are after what looks like a type annotation is indeed a good enough compromise... |
Then since you have them lets keep them and try and get the insides right.
MODES, for |
|
I'm sorry I'm not sure I follow. Do you mean you want to get rid of the mutually recursive modes inside the main If so, I will see what I can do yes. |
|
No, I was only saying that |
|
Oh ok, thanks for the clarification. |
It's impossible to do a clean job without parsing much of the grammar, so instead we are going back to a list of known types... What's new compared to before is that we now have a simple type annotation context detection, and are only matching these known type names in this context.
|
Well, I tried to continue matching type annotations without a hard-coded list of types while keeping it simple, but I don't think I will be able to get to something that does not break without parsing much more of the F# grammar. So I decided to follow your advice and stop trying to be perfect. I went back to a hard-coded list of known types that I only highlight when a type annotation context is detected. The parsing logic is much more simple, and it does a reasonable job for known types. I'm awaiting your feedback. BTW there is still the |
|
Sorry to only comment on small stuff for now, I think this is looking better overall but I'll need more time to more carefully review it later. |
| <span class="hljs-keyword">open</span> Microsoft.FSharp.Quotations | ||
| <span class="hljs-comment">// A typed code quotation.</span> | ||
| <span class="hljs-keyword">let</span> expr : Expr<<span class="hljs-type">int</span>> = <@ <span class="hljs-number">1</span> + <span class="hljs-number">1</span> @> | ||
| <span class="hljs-keyword">let</span> expr <span class="hljs-operator">:</span> Expr<span class="hljs-operator"><</span><span class="hljs-type">int</span><span class="hljs-operator">></span> <span class="hljs-operator">=</span> <span class="hljs-operator"><@</span> 1 <span class="hljs-operator">+</span> 1 <span class="hljs-operator">@></span> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happened to number here getting lost? Should 1 not still be a number?
Was just about ready to pull the trigger on merge until this caught my eye.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops! Well spotted, this is a little embarrassing...
This should be fixed now.
'=' is supposed to end a type annotation, but since the type annotation can contain operators, '=' was being consumed as content instead... I also removed the TODO with commented-out code (initially added for discussion, but it might not have been the right place)
# Conflicts: # CHANGES.md
|
Hello, Sorry for the delay. It should be ok now, hopefully... |
src/languages/fsharp.js
Outdated
| const OPERATOR = makeOperatorMode(true); | ||
| // This variant is used when matching '=' should end a parent mode: | ||
| const OPERATOR_WITHOUT_EQUAL = makeOperatorMode(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true and false have no meaning (in context) and as such are difficult to read... use named arguments/object literals instead:
const OPERATOR = makeOperatorMode({ includeEqual: false});
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok!
It's the first time I use destructuring like that, so let me know if I did something wrong.
|
@mlaily Thanks for all the time and effort! |
|
Can you confirm your change list (in the PR notes) is correct [or update it] so I can use that as the content for the squashed commit comments... |
If you are referring to the description of the PR/the first comment, I reviewed and updated it.
Thank you very much for reviewing this PR. I'm sorry it ended up this big and took so many back and forths. |

Hello !
This is a follow up PR for #3348.
I realized some things were not optimal or needed fixing...
I did not create a new issue. Let me know if this is ok or if you prefer I create a new one...
Changes
``...``. They can contain pretty much anything inside the double backquotes)Checklist
CHANGES.md