Skip to content

make tidy-alphabetical use a natural sort #141311

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

folkertdev
Copy link
Contributor

The idea here is that these lines should be correctly sorted, even though a naive string comparison would say they are not:

foo2
foo10

This is the "natural sort order".

There is more discussion in #t-compiler/help > tidy natural sort

Unfortunately, no standard sorting tools are smart enough to to this automatically (casting some doubt on whether we should make this change). Here are some sort outputs:

> cat foo.txt | sort
foo
foo1
foo10
foo2
mp
mp1e2
np", 
np1e2", 
> cat foo.txt | sort -n
foo
foo1
foo10
foo2
mp
mp1e2
np", 
np1e2", 
> cat foo.txt | sort -V
foo
foo1
foo2
foo10
mp
mp1e2
np1e2", 
np", 

Disappointingly, "numeric" sort does not actually have the behavior we want. It only sorts by numeric value if the line starts with a number. The "version" sort looks promising, but does something very unintuitive if you look at the final 4 values. None of the other options seem to have the desired behavior in all cases:

  -b, --ignore-leading-blanks  ignore leading blanks
  -d, --dictionary-order      consider only blanks and alphanumeric characters
  -f, --ignore-case           fold lower case to upper case characters
  -g, --general-numeric-sort  compare according to general numerical value
  -i, --ignore-nonprinting    consider only printable characters
  -M, --month-sort            compare (unknown) < 'JAN' < ... < 'DEC'
  -h, --human-numeric-sort    compare human readable numbers (e.g., 2K 1G)
  -n, --numeric-sort          compare according to string numerical value
  -R, --random-sort           shuffle, but group identical keys.  See shuf(1)
      --random-source=FILE    get random bytes from FILE
  -r, --reverse               reverse the result of comparisons
      --sort=WORD             sort according to WORD:
                                general-numeric -g, human-numeric -h, month -M,
                                numeric -n, random -R, version -V
  -V, --version-sort          natural sort of (version) numbers within text

r? @Noratrieb (it sounded like you know this code?)

@rustbot rustbot added A-tidy Area: The tidy tool S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels May 20, 2025
@joshtriplett
Copy link
Member

@folkertdev We have a lot of redundant versions of this sort. See the algorithm documented in the style guide; is that algorithm sufficient here? If so, that algorithm is implemented in rustfmt.

@folkertdev
Copy link
Contributor Author

Oh, right, https://github.com/rust-lang/rustfmt/blob/master/src/sort.rs is probably more robust (e.g. it handles leading/repeated zeros, which my version does not). Can we achieve code sharing in a reasonable way between tidy and rustfmt? Or should I just copy-paste that implementation?

@Noratrieb
Copy link
Member

Sharing the code is quite annoying due to how rustfmt is set up with its subtree, copying the code sounds better. It wouldn't be that bad if it diverges.

@folkertdev
Copy link
Contributor Author

I think because this algorithm was designed for rustfmt, it has some weird behavior for things that are not rust identifiers. For instance it sorts these examples like so

foo_bar
foo-bar

Contrary to alphabetical sorting; _ is sorted before all non-whitespace characters. This also causes this:

        rustc_ast_lowering = { path = \"../rustc_ast_lowering\" }
        rustc_ast = { path = \"../rustc_ast\" }

So, idk, this algorithm has some nice features, but I think it needs some tweaking for this use case.

@folkertdev folkertdev force-pushed the tidy-natural-sort branch from 1f057ed to f81c2f0 Compare May 21, 2025 11:32
@rustbot rustbot added the O-windows Operating system: Windows label May 21, 2025
@rustbot
Copy link
Collaborator

rustbot commented May 21, 2025

Some changes occurred to the CTFE machinery

cc @RalfJung, @oli-obk, @lcnr

Some changes occurred to the CTFE / Miri interpreter

cc @rust-lang/miri

@folkertdev
Copy link
Contributor Author

After looking at this a bit, I think for the sort of thing that tidy is used for, the segment-based sorting method is a poor fit. The method really overfits on sequences of rust identifiers, and in many cases produces an ordering that is different from standard string sort. Staying as close a possible to standard string sort is convenient (e.g. you can use your editor to sort lines).

So, I've updated the algorithm to 1) consider leading zeros and 2) perform a case-sensitive sort. In most cases that means your editor's sorting functionality will produce the correct result, except when there are numbers in the identifiers.

Comment on lines 298 to +293
// FEAT_SME_F8F32
("sme-f8f32", Unstable(sym::aarch64_unstable_target_feature), &["sme2", "fp8"]),
// FEAT_SME_F16F16
("sme-f16f16", Unstable(sym::aarch64_unstable_target_feature), &["sme2"]),
// FEAT_SME_F64F64
("sme-f64f64", Unstable(sym::aarch64_unstable_target_feature), &["sme"]),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice improvement.

@saethlin
Copy link
Member

saethlin commented May 21, 2025

FWIW I don't use my editor to sort these things, I just do a decent enough job by eye then comply when tidy tells me to change something. So any change that looks more sensible and/or facilitates eyeball-binary-search makes me happy.

@rustbot
Copy link
Collaborator

rustbot commented Jun 25, 2025

There are changes to the tidy tool.

cc @jieyouxu

@folkertdev
Copy link
Contributor Author

Given that Nora seems busy and @jieyouxu gets pinged for changes to tidy

r? @jieyouxu

@rustbot rustbot assigned jieyouxu and unassigned Noratrieb Jun 25, 2025
@rustbot
Copy link
Collaborator

rustbot commented Jun 25, 2025

jieyouxu is not on the review rotation at the moment.
They may take a while to respond.

Copy link
Member

@jieyouxu jieyouxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I'm not sure if this is always better (e.g. the bootstrap dist changes), but looking at the identifiers involving numbers, at least those are definitely better IMO.

@jieyouxu
Copy link
Member

@bors r+ rollup

@bors
Copy link
Collaborator

bors commented Jun 26, 2025

📌 Commit 1dfc840 has been approved by jieyouxu

It is now in the queue for this repository.

@bors bors removed the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jun 26, 2025
@bors bors added the S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. label Jun 26, 2025
workingjubilee added a commit to workingjubilee/rustc that referenced this pull request Jun 26, 2025
…eyouxu

make `tidy-alphabetical` use a natural sort

The idea here is that these lines should be correctly sorted, even though a naive string comparison would say they are not:

```
foo2
foo10
```

This is the ["natural sort order"](https://en.wikipedia.org/wiki/Natural_sort_order).

There is more discussion in [#t-compiler/help > tidy natural sort](https://rust-lang.zulipchat.com/#narrow/channel/182449-t-compiler.2Fhelp/topic/tidy.20natural.20sort/with/519111079)

Unfortunately, no standard sorting tools are smart enough to to this automatically (casting some doubt on whether we should make this change). Here are some sort outputs:

```
> cat foo.txt | sort
foo
foo1
foo10
foo2
mp
mp1e2
np",
np1e2",
> cat foo.txt | sort -n
foo
foo1
foo10
foo2
mp
mp1e2
np",
np1e2",
> cat foo.txt | sort -V
foo
foo1
foo2
foo10
mp
mp1e2
np1e2",
np",
```

Disappointingly, "numeric" sort does not actually have the behavior we want. It only sorts by numeric value if the line starts with a number. The "version" sort looks promising, but does something very unintuitive if you look at the final 4 values. None of the other options seem to have the desired behavior in all cases:

```
  -b, --ignore-leading-blanks  ignore leading blanks
  -d, --dictionary-order      consider only blanks and alphanumeric characters
  -f, --ignore-case           fold lower case to upper case characters
  -g, --general-numeric-sort  compare according to general numerical value
  -i, --ignore-nonprinting    consider only printable characters
  -M, --month-sort            compare (unknown) < 'JAN' < ... < 'DEC'
  -h, --human-numeric-sort    compare human readable numbers (e.g., 2K 1G)
  -n, --numeric-sort          compare according to string numerical value
  -R, --random-sort           shuffle, but group identical keys.  See shuf(1)
      --random-source=FILE    get random bytes from FILE
  -r, --reverse               reverse the result of comparisons
      --sort=WORD             sort according to WORD:
                                general-numeric -g, human-numeric -h, month -M,
                                numeric -n, random -R, version -V
  -V, --version-sort          natural sort of (version) numbers within text
```

r? `@Noratrieb` (it sounded like you know this code?)
bors added a commit that referenced this pull request Jun 26, 2025
Rollup of 17 pull requests

Successful merges:

 - #124595 (Suggest cloning `Arc` moved into closure)
 - #139594 (Simplify `ObligationCauseCode::IfExpression`)
 - #141311 (make `tidy-alphabetical` use a natural sort)
 - #141648 ([rustdoc] Do not emit redundant_explicit_links lint if the doc comment comes from expansion)
 - #142255 (Add edition checks for some tests that had divergent output)
 - #142285 (tests: Do not run afoul of asm.validity.non-exhaustive in input-stats)
 - #142549 (small iter.intersperse.fold() optimization)
 - #142637 (Remove some glob imports from the type system)
 - #142647 ([perf] Compute hard errors without diagnostics in impl_intersection_has_impossible_obligation)
 - #142700 (Remove incorrect comments in `Weak`)
 - #142884 (StableMIR: Add method to retrieve body of coroutine)
 - #142925 (Rewrite `.gitattributes` CRLF ui tests into run-make tests)
 - #143001 (Rename run always )
 - #143010 (Update `browser-ui-test` version to `0.20.7`)
 - #143015 (Add `sym::macro_pin` diagnostic item for `core::pin::pin!()`)
 - #143020 (codegen_fn_attrs: make comment more precise)
 - #143033 (Expand const-stabilized API links in relnotes)

r? `@ghost`
`@rustbot` modify labels: rollup
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Jun 26, 2025
…eyouxu

make `tidy-alphabetical` use a natural sort

The idea here is that these lines should be correctly sorted, even though a naive string comparison would say they are not:

```
foo2
foo10
```

This is the ["natural sort order"](https://en.wikipedia.org/wiki/Natural_sort_order).

There is more discussion in [#t-compiler/help > tidy natural sort](https://rust-lang.zulipchat.com/#narrow/channel/182449-t-compiler.2Fhelp/topic/tidy.20natural.20sort/with/519111079)

Unfortunately, no standard sorting tools are smart enough to to this automatically (casting some doubt on whether we should make this change). Here are some sort outputs:

```
> cat foo.txt | sort
foo
foo1
foo10
foo2
mp
mp1e2
np",
np1e2",
> cat foo.txt | sort -n
foo
foo1
foo10
foo2
mp
mp1e2
np",
np1e2",
> cat foo.txt | sort -V
foo
foo1
foo2
foo10
mp
mp1e2
np1e2",
np",
```

Disappointingly, "numeric" sort does not actually have the behavior we want. It only sorts by numeric value if the line starts with a number. The "version" sort looks promising, but does something very unintuitive if you look at the final 4 values. None of the other options seem to have the desired behavior in all cases:

```
  -b, --ignore-leading-blanks  ignore leading blanks
  -d, --dictionary-order      consider only blanks and alphanumeric characters
  -f, --ignore-case           fold lower case to upper case characters
  -g, --general-numeric-sort  compare according to general numerical value
  -i, --ignore-nonprinting    consider only printable characters
  -M, --month-sort            compare (unknown) < 'JAN' < ... < 'DEC'
  -h, --human-numeric-sort    compare human readable numbers (e.g., 2K 1G)
  -n, --numeric-sort          compare according to string numerical value
  -R, --random-sort           shuffle, but group identical keys.  See shuf(1)
      --random-source=FILE    get random bytes from FILE
  -r, --reverse               reverse the result of comparisons
      --sort=WORD             sort according to WORD:
                                general-numeric -g, human-numeric -h, month -M,
                                numeric -n, random -R, version -V
  -V, --version-sort          natural sort of (version) numbers within text
```

r? ``@Noratrieb`` (it sounded like you know this code?)
bors added a commit that referenced this pull request Jun 26, 2025
Rollup of 9 pull requests

Successful merges:

 - #124595 (Suggest cloning `Arc` moved into closure)
 - #139594 (Simplify `ObligationCauseCode::IfExpression`)
 - #141311 (make `tidy-alphabetical` use a natural sort)
 - #141648 ([rustdoc] Do not emit redundant_explicit_links lint if the doc comment comes from expansion)
 - #142285 (tests: Do not run afoul of asm.validity.non-exhaustive in input-stats)
 - #142393 (Don't  give APITs names with macro expansion placeholder fragments in it)
 - #142884 (StableMIR: Add method to retrieve body of coroutine)
 - #142981 (Make missing lifetime suggestion verbose)
 - #143030 (Fix suggestion spans inside macros for the `unused_must_use` lint)

r? `@ghost`
`@rustbot` modify labels: rollup
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tidy Area: The tidy tool O-windows Operating system: Windows S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants