Skip to content

Conversation

nnethercote
Copy link
Contributor

Some cleanups I found while working on RFC 3349 that are worth landing separately.

r? @fee1-dead

The parser already does a check-only unescaping which catches all
errors. So the checking done in `from_token_lit` never hits.

But literals causing warnings can still occur in `from_token_lit`. So
the commit changes `str-escape.rs` to use byte string literals and C
string literals as well, to give better coverage and ensure the new
assertions in `from_token_lit` are correct.
The `CString` handling code is erroneously identical to the `ByteString`
handling code.
The `T` type in these functions took me some time to understand, and I
find the explicit `T` in the use of `from` makes the code easier to
read, as does the `u8` annotation in `scan_escape`.
- Rename it as `MixedUnit`, because it will soon be used in more than
  just C string literals.
- Change the `Byte` variant to `HighByte` and use it only for
  `\x80`..`\xff` cases. This fixes the old inexactness where ASCII chars
  could be encoded with either `Byte` or `Char`.
- Add useful comments.
- Remove `is_ascii`, in favour of `u8::is_ascii`.
I find it easier if they describe what's allowed, rather than what's
forbidden. Also, consistent naming makes them easier to understand.
`unescape_literal` becomes `unescape_unicode`, and `unescape_c_string`
becomes `unescape_mixed`. Because rfc3349 will mean that C string
literals will no longer be the only mixed utf8 literals.
They can't contain `\x` escapes, which means they can't contain high
bytes, which means we can used `unescape_unicode` instead of
`unescape_mixed` to unescape them. This avoids unnecessary used of
`MixedUnit`.
@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jan 25, 2024
@rustbot
Copy link
Collaborator

rustbot commented Jan 25, 2024

rust-analyzer is developed in its own repository. If possible, consider making this change to rust-lang/rust-analyzer instead.

cc @rust-lang/rust-analyzer

Some changes occurred in src/tools/clippy

cc @rust-lang/clippy

Copy link
Member

@fee1-dead fee1-dead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@bors r+

@fee1-dead
Copy link
Member

@bors r+

@bors
Copy link
Collaborator

bors commented Jan 26, 2024

📌 Commit 6be2e56 has been approved by fee1-dead

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 26, 2024
@nnethercote
Copy link
Contributor Author

Thanks for the fast review :)

bors added a commit to rust-lang-ci/rust that referenced this pull request Jan 26, 2024
…iaskrgr

Rollup of 9 pull requests

Successful merges:

 - rust-lang#117420 (Make `#![allow_internal_unstable(..)]` work with `stmt_expr_attributes`)
 - rust-lang#117678 (Stabilize `slice_group_by`)
 - rust-lang#119917 (Remove special-case handling of `vec.split_off(0)`)
 - rust-lang#120117 (Update `std::io::Error::downcast` return type)
 - rust-lang#120329 (RFC 3349 precursors)
 - rust-lang#120339 (privacy: Refactor top-level visiting in `NamePrivacyVisitor`)
 - rust-lang#120345 (Clippy subtree update)
 - rust-lang#120360 (Don't fire `OPAQUE_HIDDEN_INFERRED_BOUND` on sized return of AFIT)
 - rust-lang#120372 (Fix outdated comment on Box)

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit 5f1f617 into rust-lang:master Jan 26, 2024
@rustbot rustbot added this to the 1.77.0 milestone Jan 26, 2024
rust-timer added a commit to rust-lang-ci/rust that referenced this pull request Jan 26, 2024
Rollup merge of rust-lang#120329 - nnethercote:3349-precursors, r=fee1-dead

RFC 3349 precursors

Some cleanups I found while working on RFC 3349 that are worth landing separately.

r? `@fee1-dead`
@nnethercote nnethercote deleted the 3349-precursors branch January 28, 2024 20:32
nnethercote added a commit to nnethercote/rust that referenced this pull request Jan 29, 2024
Currently the parser will interpret any label in certain positions as a
mistyped char literal, on the assumption that the trailing single quote
was accidentally omitted. This is reasonable for a label like 'a (because
'a' would be valid) but not reasonable for a label like 'abc (because
'abc' is not valid).

This commit restricts this behaviour only to labels that would be valid
char literals, via the new `could_be_unclosed_char_literal` function.
The commit also augments the `label-is-actually-char.rs` test in a
couple of ways:
- Adds testing of labels with identifiers longer than one char, e.g.
  'abc.
- Adds a new match with simpler patterns, because the
  `recover_unclosed_char` call in `parse_pat_with_range_pat` was not
  being exercised (in this test or any other ui tests).

Fixes rust-lang#120397, an assertion failure, which was caused by this behaviour
in the parser interacting with some new stricter char literal checking
added in rust-lang#120329.
nnethercote added a commit to nnethercote/rust that referenced this pull request Jan 29, 2024
…r literal.

Currently the parser will interpret any label/lifetime in certain
positions as a mistyped char literal, on the assumption that the
trailing single quote was accidentally omitted. This is reasonable for a
something like 'a (because 'a' would be valid) but not reasonable for a
something like 'abc (because 'abc' is not valid).

This commit restricts this behaviour only to labels/lifetimes that would
be valid char literals, via the new `could_be_unclosed_char_literal`
function. The commit also augments the `label-is-actually-char.rs` test
in a couple of ways:
- Adds testing of labels/lifetimes with identifiers longer than one
  char, e.g. 'abc.
- Adds a new match with simpler patterns, because the
  `recover_unclosed_char` call in `parse_pat_with_range_pat` was not
  being exercised (in this test or any other ui tests).

Fixes rust-lang#120397, an assertion failure, which was caused by this behaviour
in the parser interacting with some new stricter char literal checking
added in rust-lang#120329.
GuillaumeGomez added a commit to GuillaumeGomez/rust that referenced this pull request Jan 30, 2024
…-errors

Be more careful about interpreting a label/lifetime as a mistyped char literal.

Currently the parser interprets any label/lifetime in certain positions as a mistyped char literal, on the assumption that the trailing single quote was accidentally omitted. In such cases it gives an error with a suggestion to add the trailing single quote, and then puts the appropriate char literal into the AST. This behaviour was introduced in rust-lang#101293.

This is reasonable for a case like this:
```
let c = 'a;
```
because `'a'` is a valid char literal. It's less reasonable for a case like this:
```
let c = 'abc;
```
because `'abc'` is not a valid char literal.

Prior to rust-lang#120329 this could result in some sub-optimal suggestions in error messages, but nothing else. But rust-lang#120329 changed `LitKind::from_token_lit` to assume that the char/byte/string literals it receives are valid, and to assert if not. This is reasonable because the lexer does not produce invalid char/byte/string literals in general. But in this "interpret label/lifetime as unclosed char literal" case the parser can produce an invalid char literal with contents such as `abc`, which triggers an assertion failure.

This PR changes the parser so it's more cautious about interpreting labels/lifetimes as unclosed char literals.

Fixes rust-lang#120397.

r? `@compiler-errors`
rust-timer added a commit to rust-lang-ci/rust that referenced this pull request Jan 30, 2024
Rollup merge of rust-lang#120460 - nnethercote:fix-120397, r=compiler-errors

Be more careful about interpreting a label/lifetime as a mistyped char literal.

Currently the parser interprets any label/lifetime in certain positions as a mistyped char literal, on the assumption that the trailing single quote was accidentally omitted. In such cases it gives an error with a suggestion to add the trailing single quote, and then puts the appropriate char literal into the AST. This behaviour was introduced in rust-lang#101293.

This is reasonable for a case like this:
```
let c = 'a;
```
because `'a'` is a valid char literal. It's less reasonable for a case like this:
```
let c = 'abc;
```
because `'abc'` is not a valid char literal.

Prior to rust-lang#120329 this could result in some sub-optimal suggestions in error messages, but nothing else. But rust-lang#120329 changed `LitKind::from_token_lit` to assume that the char/byte/string literals it receives are valid, and to assert if not. This is reasonable because the lexer does not produce invalid char/byte/string literals in general. But in this "interpret label/lifetime as unclosed char literal" case the parser can produce an invalid char literal with contents such as `abc`, which triggers an assertion failure.

This PR changes the parser so it's more cautious about interpreting labels/lifetimes as unclosed char literals.

Fixes rust-lang#120397.

r? `@compiler-errors`
flip1995 pushed a commit to flip1995/rust that referenced this pull request Feb 8, 2024
…1-dead

RFC 3349 precursors

Some cleanups I found while working on RFC 3349 that are worth landing separately.

r? `@fee1-dead`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants