-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Speed up Convert.ToBase64String() #69884
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I couldn't figure out the best area label to add to this PR. If you have write-permissions please help me learn by adding exactly one area label. |
|
Tagging subscribers to this area: @dotnet/area-system-text-encoding Issue DetailsIt is 30% performance optimization: Benchmarked on 1000 random byte arrays of lengths 0..999 Used tricks:
|
Co-authored-by: Jeff Handley <[email protected]>
| j += 4; | ||
| a = *inData++; | ||
| b = *inData++; | ||
| *outPairs++ = base64Pairs[(a << 4) | (b >> 4)]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tannergooding
Why is the codegen soo different to LLVM? https://godbolt.org/z/T8eqE9chv
LLVM
shr sil, 4
shl dil, 4
or dil, sil
movzx eax, dil
ret.NET JIT
movzx rax, dil
shl eax, 4
movzx rdi, sil
sar edi, 4
or eax, edi
ret There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@deeprobin feel free to file an issue. Related one is #13816 (you can use PR that closed it as a foundation for your PR if you want to contribute 🙂)
|
There was a PR to use SSE for this API - dotnet/coreclr#21833 |
Why didn't we finish it? |
It got lost during coreclr->runtime migration 🙂 I can port it to crossplat intrinsics after(if) this PR lands to avoid conflicts |
A quick skim of that PR suggests it doesn't require a 16K lookup table? If that's the case, with appreciation to Sergei, I'd prefer we just start with that PR as something that's faster and smaller. |
I agree, vector intrinsics are always preferable. This 16K-price implementation may be fallback for platforms without SSE/AVX-like instructions. |
|
@EgorBo, are you still going to look at bringing that PR back to life? |
Sure, will take a look in an hour |
So I took a look - there was a small bug in the impl, but the problem that it won't be trivial to extend it to support Arm - it has to many non-shared intrinsics, I'll try to port it in coming days |
|
@EgorBo, @SergeiPavlov, can this be closed now that @EgorBo's change went in? |
|
Yes. The function is optimized by a959c3e |
|
Thanks, @SergeiPavlov. |
It is 30% performance optimization:
Benchmarked on 1000 random byte arrays of lengths 0..999
Used tricks:
intto process two char values simultaneously.inDataarray twice.*ptr++instead ofptrBase[index]; ++index;. We are savingaddinstruction.offsetparameter fromConvertToBase64Array().inDataisarray base + offset.insertLineBreaksimpacts on number of loop rounds before next complex condition check.BitConverter.IsLittleEndianfor correct work on Big-endian platforms.