-
Notifications
You must be signed in to change notification settings - Fork 13.6k
Optimize char::is_alphanumeric
#145027
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize char::is_alphanumeric
#145027
Conversation
Avoid an unnecessary call to `unicode::Alphabetic` when `self` is an ASCII digit (ie `0..=9`).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this does seem plausible, but also it makes things slower for things that aren't actually ASCII because there's another check.
Do you have some benchmark results showing that it's at least still better on mixed text? I think there's some data in the core benches that can be used to show that on a chinese wikipedia page or similar.
Relatedly, if this is worth doing it feels like it's probably worth doing everywhere, and in the process updating the data generator to not include the ascii cases, requiring instead that the needle
passed to skip_search
isn't ascii.
if self.is_ascii() { | ||
self.is_ascii_alphanumeric() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unsure: with https://github.com/rust-lang/rust/pull/143467/files#diff-5f02a23b56d56296cfa0b9f2dc32075d26638c5aa232c4aa551131785dad8887R759 about to land, should this instead go via that?
if self.is_ascii() { | |
self.is_ascii_alphanumeric() | |
if let Some(a) = self.as_ascii() { | |
a.is_alphanumeric() |
(Don't know how much it matters, but I always prefer using types to avoid boolean blindness, where possible.)
I actually already have a WIP branch that does just that as well as some other tricks to reduce the size of the unicode tables. I will submit it when it is ready |
You know what, looking again @bors r+ |
Rollup of 23 pull requests Successful merges: - #141658 (rustdoc search: prefer stable items in search results) - #141828 (Add diagnostic explaining STATUS_STACK_BUFFER_OVERRUN not only being used for stack buffer overruns if link.exe exits with that exit code) - #144823 (coverage: Extract HIR-related helper code out of the main module) - #144883 (Remove unneeded `drop_in_place` calls) - #144923 (Move several more float tests to floats/mod.rs) - #144988 (Add annotations to the graphviz region graph on region origins) - #145010 (Couple of minor abi handling cleanups) - #145017 (Explicitly disable vector feature on s390x baseline of bad-reg test) - #145027 (Optimize `char::is_alphanumeric`) - #145050 (add member constraints tests) - #145073 (update enzyme submodule to handle llvm 21) - #145080 (Escape diff strings in MIR dataflow graphviz) - #145082 (Fix some bad formatting in `-Zmacro-stats` output.) - #145083 (Fix cross-compilation of Cargo) - #145096 (Fix wasm target build with atomics feature) - #145097 (remove unnecessary `TypeFoldable` impls) - #145100 (Rank doc aliases lower than equivalently matched items) - #145103 (rustc_metadata: remove unused private trait impls) - #145115 (defer opaque type errors, generally greatly reduce tainting) - #145119 (rustc_public: fix missing parenthesis in pretty discriminant) - #145124 (Recover `for PAT = EXPR {}`) - #145132 (Refactor map_unit_fn lint) - #145134 (Reduce indirect assoc parent queries) r? `@ghost` `@rustbot` modify labels: rollup
Rollup of 23 pull requests Successful merges: - #141658 (rustdoc search: prefer stable items in search results) - #141828 (Add diagnostic explaining STATUS_STACK_BUFFER_OVERRUN not only being used for stack buffer overruns if link.exe exits with that exit code) - #144823 (coverage: Extract HIR-related helper code out of the main module) - #144883 (Remove unneeded `drop_in_place` calls) - #144923 (Move several more float tests to floats/mod.rs) - #144988 (Add annotations to the graphviz region graph on region origins) - #145010 (Couple of minor abi handling cleanups) - #145017 (Explicitly disable vector feature on s390x baseline of bad-reg test) - #145027 (Optimize `char::is_alphanumeric`) - #145050 (add member constraints tests) - #145073 (update enzyme submodule to handle llvm 21) - #145080 (Escape diff strings in MIR dataflow graphviz) - #145082 (Fix some bad formatting in `-Zmacro-stats` output.) - #145083 (Fix cross-compilation of Cargo) - #145096 (Fix wasm target build with atomics feature) - #145097 (remove unnecessary `TypeFoldable` impls) - #145100 (Rank doc aliases lower than equivalently matched items) - #145103 (rustc_metadata: remove unused private trait impls) - #145115 (defer opaque type errors, generally greatly reduce tainting) - #145119 (rustc_public: fix missing parenthesis in pretty discriminant) - #145124 (Recover `for PAT = EXPR {}`) - #145132 (Refactor map_unit_fn lint) - #145134 (Reduce indirect assoc parent queries) r? `@ghost` `@rustbot` modify labels: rollup
Rollup merge of #145027 - Kmeakin:km/optimize-char-is-alphanumeric, r=scottmcm Optimize `char::is_alphanumeric` Avoid an unnecessary call to `unicode::Alphabetic` when `self` is an ASCII digit (ie `0..=9`).
Hi bors this already merged |
@rustbot label: -S-waiting-on-author +S-waiting-on-bors +merged-by-bors |
Avoid an unnecessary call to
unicode::Alphabetic
whenself
is an ASCII digit (ie0..=9
).