Skip to content

Optimize char::is_alphanumeric #145027

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 9, 2025

Conversation

Kmeakin
Copy link
Contributor

@Kmeakin Kmeakin commented Aug 6, 2025

Avoid an unnecessary call to unicode::Alphabetic when self is an ASCII digit (ie 0..=9).

Avoid an unnecessary call to `unicode::Alphabetic` when `self` is an
ASCII digit (ie `0..=9`).
@rustbot
Copy link
Collaborator

rustbot commented Aug 6, 2025

r? @scottmcm

rustbot has assigned @scottmcm.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Aug 6, 2025
Copy link
Member

@scottmcm scottmcm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this does seem plausible, but also it makes things slower for things that aren't actually ASCII because there's another check.

Do you have some benchmark results showing that it's at least still better on mixed text? I think there's some data in the core benches that can be used to show that on a chinese wikipedia page or similar.

Relatedly, if this is worth doing it feels like it's probably worth doing everywhere, and in the process updating the data generator to not include the ascii cases, requiring instead that the needle passed to skip_search isn't ascii.

Comment on lines +923 to +924
if self.is_ascii() {
self.is_ascii_alphanumeric()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unsure: with https://github.com/rust-lang/rust/pull/143467/files#diff-5f02a23b56d56296cfa0b9f2dc32075d26638c5aa232c4aa551131785dad8887R759 about to land, should this instead go via that?

Suggested change
if self.is_ascii() {
self.is_ascii_alphanumeric()
if let Some(a) = self.as_ascii() {
a.is_alphanumeric()

(Don't know how much it matters, but I always prefer using types to avoid boolean blindness, where possible.)

@scottmcm scottmcm added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 7, 2025
@Kmeakin
Copy link
Contributor Author

Kmeakin commented Aug 9, 2025

Relatedly, if this is worth doing it feels like it's probably worth doing everywhere, and in the process updating the data generator to not include the ascii cases, requiring instead that the needle passed to skip_search isn't ascii.

I actually already have a WIP branch that does just that as well as some other tricks to reduce the size of the unicode tables. I will submit it when it is ready

@scottmcm
Copy link
Member

scottmcm commented Aug 9, 2025

You know what, looking again is_uppercase and is_whitespace and such essentially have this, so might as well just do it.

@bors r+

@bors
Copy link
Collaborator

bors commented Aug 9, 2025

📌 Commit bf50209 has been approved by scottmcm

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Aug 9, 2025
bors added a commit that referenced this pull request Aug 9, 2025
Rollup of 23 pull requests

Successful merges:

 - #141658 (rustdoc search: prefer stable items in search results)
 - #141828 (Add diagnostic explaining STATUS_STACK_BUFFER_OVERRUN not only being used for stack buffer overruns if link.exe exits with that exit code)
 - #144823 (coverage: Extract HIR-related helper code out of the main module)
 - #144883 (Remove unneeded `drop_in_place` calls)
 - #144923 (Move several more float tests to floats/mod.rs)
 - #144988 (Add annotations to the graphviz region graph on region origins)
 - #145010 (Couple of minor abi handling cleanups)
 - #145017 (Explicitly disable vector feature on s390x baseline of bad-reg test)
 - #145027 (Optimize `char::is_alphanumeric`)
 - #145050 (add member constraints tests)
 - #145073 (update enzyme submodule to handle llvm 21)
 - #145080 (Escape diff strings in MIR dataflow graphviz)
 - #145082 (Fix some bad formatting in `-Zmacro-stats` output.)
 - #145083 (Fix cross-compilation of Cargo)
 - #145096 (Fix wasm target build with atomics feature)
 - #145097 (remove unnecessary `TypeFoldable` impls)
 - #145100 (Rank doc aliases lower than equivalently matched items)
 - #145103 (rustc_metadata: remove unused private trait impls)
 - #145115 (defer opaque type errors, generally greatly reduce tainting)
 - #145119 (rustc_public: fix missing parenthesis in pretty discriminant)
 - #145124 (Recover `for PAT = EXPR {}`)
 - #145132 (Refactor map_unit_fn lint)
 - #145134 (Reduce indirect assoc parent queries)

r? `@ghost`
`@rustbot` modify labels: rollup
bors added a commit that referenced this pull request Aug 9, 2025
Rollup of 23 pull requests

Successful merges:

 - #141658 (rustdoc search: prefer stable items in search results)
 - #141828 (Add diagnostic explaining STATUS_STACK_BUFFER_OVERRUN not only being used for stack buffer overruns if link.exe exits with that exit code)
 - #144823 (coverage: Extract HIR-related helper code out of the main module)
 - #144883 (Remove unneeded `drop_in_place` calls)
 - #144923 (Move several more float tests to floats/mod.rs)
 - #144988 (Add annotations to the graphviz region graph on region origins)
 - #145010 (Couple of minor abi handling cleanups)
 - #145017 (Explicitly disable vector feature on s390x baseline of bad-reg test)
 - #145027 (Optimize `char::is_alphanumeric`)
 - #145050 (add member constraints tests)
 - #145073 (update enzyme submodule to handle llvm 21)
 - #145080 (Escape diff strings in MIR dataflow graphviz)
 - #145082 (Fix some bad formatting in `-Zmacro-stats` output.)
 - #145083 (Fix cross-compilation of Cargo)
 - #145096 (Fix wasm target build with atomics feature)
 - #145097 (remove unnecessary `TypeFoldable` impls)
 - #145100 (Rank doc aliases lower than equivalently matched items)
 - #145103 (rustc_metadata: remove unused private trait impls)
 - #145115 (defer opaque type errors, generally greatly reduce tainting)
 - #145119 (rustc_public: fix missing parenthesis in pretty discriminant)
 - #145124 (Recover `for PAT = EXPR {}`)
 - #145132 (Refactor map_unit_fn lint)
 - #145134 (Reduce indirect assoc parent queries)

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit 2a7354e into rust-lang:master Aug 9, 2025
10 checks passed
@rustbot rustbot added this to the 1.91.0 milestone Aug 9, 2025
rust-timer added a commit that referenced this pull request Aug 9, 2025
Rollup merge of #145027 - Kmeakin:km/optimize-char-is-alphanumeric, r=scottmcm

Optimize `char::is_alphanumeric`

Avoid an unnecessary call to `unicode::Alphabetic` when `self` is an ASCII digit (ie `0..=9`).
@jieyouxu
Copy link
Member

jieyouxu commented Aug 9, 2025

Hi bors this already merged
@bors r-

@bors bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Aug 9, 2025
@jieyouxu
Copy link
Member

jieyouxu commented Aug 9, 2025

@rustbot label: -S-waiting-on-author +S-waiting-on-bors +merged-by-bors

@rustbot rustbot added merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Aug 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants