-
Notifications
You must be signed in to change notification settings - Fork 14.6k
Description
Clang optimizes sub-optimally for the following code:
struct S {
int& x;
int& y;
bool check() {
return x < y;
}
};
[[noreturn]] [[gnu::cold]] void bar(const S& s);
void foo(int a, int b) {
S s{a, b};
if(s.check()) [[unlikely]] { // very unlikely and cold
bar(s);
}
}
foo(int, int):
sub rsp, 24
mov dword ptr [rsp + 4], edi
mov dword ptr [rsp], esi
lea rax, [rsp + 4]
mov qword ptr [rsp + 8], rax
mov rax, rsp
mov qword ptr [rsp + 16], rax
cmp edi, esi
jl .LBB0_2
add rsp, 24
ret
.LBB0_2:
lea rdi, [rsp + 8]
call bar(S const&)@PLT
Struct S
must be on the stack in order to call bar()
, however, that's only needed in the unlikely case that the condition fails. Ideally the codegen should be the following:
foo(int, int):
cmp edi, esi
jl .LBB0_2
ret
.LBB0_2:
sub rsp, 24
... copy edi/esi to the stack and make struct S ...
call bar(S const&)@PLT
MSVC generates something along these lines, gcc and clang do not: https://godbolt.org/z/4axKfoe8x
I can't simply write if(a < b)
or delay the construction of S
until inside the branch. My specific use case that results in code like this is libassert, where an expression template is built from the user's condition and that is evaluated and inspected during assertion failure.
Even if the code is written as follows, clang still generates sub-ideal code:
void foo(int a, int b) {
if(a < b) [[unlikely]] { // very unlikely and cold
S s{a, b};
bar(s);
}
}
foo(int, int):
sub rsp, 24
mov dword ptr [rsp + 4], edi
mov dword ptr [rsp], esi
cmp edi, esi
jl .LBB0_2
add rsp, 24
ret
.LBB0_2:
lea rax, [rsp + 4]
mov qword ptr [rsp + 8], rax
mov rax, rsp
mov qword ptr [rsp + 16], rax
lea rdi, [rsp + 8]
call bar(S const&)@PLT
This may be a tricky optimization to perform, however, due to the above I expect it would benefit a large amount of code.