Skip to content

Conversation

gilescope
Copy link
Contributor

@gilescope gilescope commented Feb 14, 2021

I found out the other day that all the ascii digits have the first four bits as one would hope them to. (Eg. char 2 ends 0b0010). There are two bits to indicate it's in the digit range ( 0b0011_0000). If it is a true digit then all the higher bits aside from these two will be 0 (as ascii is the lowest part of the unicode u32 spectrum). So XORing with 0b11_0000 should mean we either get the number 0-9 or alternativly we get a larger number in the u32 space. If we get something that's not 0-9 then it will be discarded as it will be greater than the radix.

The code seems so fast though that there's quite a lot of noise in the benchmarks so it's not that easy to prove conclusively that it's faster as well as less instructions.

The non-fast path I was toying with as well wondering if we could do this as then we'd only have one return and less instructions still:

           match self {
                'a'..='z' => self as u32 - 'a' as u32 + 10,
                'A'..='Z' => self as u32 - 'A' as u32 + 10,
                _ => { radix = 10; self as u32 ^ ASCII_DIGIT_MASK},
            }

Here's the godbolt.

( H/T to @byteshadow for pointing out xor was what I needed)

@rust-highfive
Copy link
Contributor

r? @dtolnay

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Feb 14, 2021
@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@m-ou-se m-ou-se assigned m-ou-se and unassigned dtolnay Feb 14, 2021
@m-ou-se m-ou-se added the T-libs Relevant to the library team, which will review and decide on the PR/issue. label Feb 14, 2021
Co-authored-by: Mara <[email protected]>
@rust-log-analyzer

This comment has been minimized.

Remove unused const
@m-ou-se
Copy link
Member

m-ou-se commented Feb 15, 2021

@bors r+

@bors
Copy link
Collaborator

bors commented Feb 15, 2021

📌 Commit d2ba68b has been approved by m-ou-se

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 15, 2021
Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this pull request Feb 16, 2021
To digit simplification

I found out the other day that all the ascii digits have the first four bits as one would hope them to. (Eg. char `2` ends `0b0010`). There are two bits to indicate it's in the digit range ( `0b0011_0000`). If it is a true digit then all the higher bits aside from these two will be 0 (as ascii is the lowest part of the unicode u32 spectrum). So XORing with `0b11_0000` should mean we either get the number 0-9 or alternativly we get a larger number in the u32 space. If we get something that's not 0-9 then it will be discarded as it will be greater than the radix.

The code seems so fast though that there's quite a lot of noise in the benchmarks so it's not that easy to prove conclusively that it's faster as well as less instructions.

The non-fast path I was toying with as well wondering if we could do this as then we'd only have one return and less instructions still:
```
           match self {
                'a'..='z' => self as u32 - 'a' as u32 + 10,
                'A'..='Z' => self as u32 - 'A' as u32 + 10,
                _ => { radix = 10; self as u32 ^ ASCII_DIGIT_MASK},
            }
```

Here's the [godbolt](https://godbolt.org/z/883c9n).

( H/T to `@byteshadow` for pointing out xor was what I needed)
bors added a commit to rust-lang-ci/rust that referenced this pull request Feb 17, 2021
…laumeGomez

Rollup of 11 pull requests

Successful merges:

 - rust-lang#79981 (Add 'consider using' message to overflowing_literals)
 - rust-lang#82094 (To digit simplification)
 - rust-lang#82105 (Don't fail to remove files if they are missing)
 - rust-lang#82136 (Fix ICE: Use delay_span_bug for mismatched subst/hir arg)
 - rust-lang#82169 (Document that `assert!` format arguments are evaluated lazily)
 - rust-lang#82174 (Replace File::create and write_all with fs::write)
 - rust-lang#82196 (Add caveat to Path::display() about lossiness)
 - rust-lang#82198 (Use internal iteration in Iterator::is_sorted_by)
 - rust-lang#82204 (Update books)
 - rust-lang#82207 (rustdoc: treat edition 2021 as unstable)
 - rust-lang#82231 (Add long explanation for E0543)

Failed merges:

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit 253631d into rust-lang:master Feb 17, 2021
@rustbot rustbot added this to the 1.52.0 milestone Feb 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants