Skip to content

Conversation

@kunalspathak
Copy link
Contributor

@kunalspathak kunalspathak commented Jun 2, 2023

Extending #85619 and #82973 for linux.

       mov      rdi, 0x7FA9DB182F28      ; GOT descriptor address of `maxThreadStaticBlocks`
       mov      rax, 0x7FA9DB7A5DB0      ; function address of __tls_get_addr
       call     rax                      ; gets the address of `maxThreadStaticBlocks`
       cmp      dword ptr [rax], 2
       jl       SHORT G_M8151_IG06
       mov      rdi, qword ptr [rax+08H]   ; access the `threadStaticBlocks` variable
       mov      rdi, qword ptr [rdi+10H]
       test     rdi, rdi
       je       SHORT G_M8151_IG06
       mov      rax, bword ptr [rdi]
       add      rax, 16

Base:

|          Method |     Mean |     Error |    StdDev |   Median |      Min |      Max | Allocated |
|---------------- |---------:|----------:|----------:|---------:|---------:|---------:|----------:|
| GetThreadStatic | 3.383 ns | 0.0378 ns | 0.0316 ns | 3.374 ns | 3.346 ns | 3.459 ns |         - |
| SetThreadStatic | 5.334 ns | 0.0733 ns | 0.0650 ns | 5.338 ns | 5.261 ns | 5.451 ns |         - |

Diff:

|          Method |     Mean |     Error |    StdDev |   Median |      Min |      Max | Allocated |
|---------------- |---------:|----------:|----------:|---------:|---------:|---------:|----------:|
| GetThreadStatic | 1.048 ns | 0.0200 ns | 0.0177 ns | 1.041 ns | 1.026 ns | 1.087 ns |         - |
| SetThreadStatic | 2.844 ns | 0.0379 ns | 0.0354 ns | 2.837 ns | 2.807 ns | 2.915 ns |         - | 

This also adds mrs instruction in arm64 to access TLS on linux. The codegen is:

G_M8151_IG02:  
            mrs     x0, tpidr_el0
            ldr     w1, [x0, #0xE8]  ; `maxThreadStaticBlocks` value
            cmp     w1, #2
            blt     G_M8151_IG05
            ldr     x0, [x0, #0xF0]  ; `threadStaticBlocks` array
            ldr     x1, [x0, #0x10]
            cbz     x1, G_M8151_IG05

G_M8151_IG03:  
            mov     w0, #5
            str     w0, [x1, #0x24]

osx/arm64:

Because of register allocator doesn't reuse the x0 properly, we end up creating the same constant again in x1, but this is how osx/arm64 looks like:

            movz    x0, #0x8C00
            movk    x0, #817 LSL #16
            movk    x0, #1 LSL #32
            movz    x1, #0x8C00
            movk    x1, #817 LSL #16
            movk    x1, #1 LSL #32
            ldr     x1, [x1]
            blr     x1

osx/x64:

image

@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jun 2, 2023
@ghost ghost assigned kunalspathak Jun 2, 2023
@ghost
Copy link

ghost commented Jun 2, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Extending #85619 and #82973 for linux.

Author: kunalspathak
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

JITDUMP("offsetOfThreadLocalStoragePointer= %u\n", threadStaticBlocksInfo.offsetOfThreadLocalStoragePointer);
JITDUMP("offsetOfMaxThreadStaticBlocks= %u\n", offsetOfMaxThreadStaticBlocksVal);
JITDUMP("offsetOfThreadStaticBlocks= %u\n", offsetOfThreadStaticBlocksVal);
JITDUMP("tlsIndex= %u\n", (ssize_t)threadStaticBlocksInfo.tlsIndex.addr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dump threadStaticBlocksInfo.tlsIndex.accessType also?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it will be useful? I don't see we print accessType elsewhere.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it will be useful? I don't see we print accessType elsewhere.

I don't know. Structurally, it seems odd to omit it. Up to you.

@ghost ghost added needs-author-action An issue or pull request that requires more info or actions from the author. and removed needs-author-action An issue or pull request that requires more info or actions from the author. labels Jul 1, 2023
@ghost ghost added needs-author-action An issue or pull request that requires more info or actions from the author. and removed needs-author-action An issue or pull request that requires more info or actions from the author. labels Jul 3, 2023
@jakobbotsch jakobbotsch dismissed their stale review July 3, 2023 21:15

Addressed

@kunalspathak
Copy link
Contributor Author

Ping @BruceForstall

Copy link
Contributor

@BruceForstall BruceForstall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM w/ one nit

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants