Skip to content

Bmi2.MultiplyNoFlags issues #11782

@pentp

Description

@pentp

I have a working version of decimal.DecCalc which uses MultiplyNoFlags from dotnet/coreclr#21480 in a branch, but I discovered two issues.

If MultiplyNoFlags is called without having its result used, then it's assumed to be a no-op, even if the low part is used. While such use would be sub-optimal, it should still be valid (fixed in dotnet/coreclr#21928).

The second problem is that performance is increased only up to 3% for some methods, while others suffer a performance penalty up to 20%! This is primarily caused by forcing the low result to be written to memory and excessive temporary register use, compounded by forced zero-init of the locals (even with no .locals init) which affects all code paths of the function.

static unsafe ulong mulx(ulong a, ulong b)
{
	ulong r;
	return X86.Bmi2.X64.MultiplyNoFlags(a, b, &r) + r;
}
; Assembly listing for method DecCalc:mulx(long,long):long
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; Final local variable assignments
;
;  V00 arg0         [V00,T00] (  3,  3   )    long  ->  rcx        
;  V01 arg1         [V01,T01] (  3,  3   )    long  ->  [rsp+0x18]  
;  V02 loc0         [V02    ] (  2,  2   )    long  ->  [rsp+0x00]   do-not-enreg[X] must-init addr-exposed ld-addr-op
;# V03 OutArgs      [V03    ] (  1,  1   )  lclBlk ( 0) [rsp+0x00]   "OutgoingArgSpace"
;
; Lcl frame size = 8

G_M42317_IG01:
       push     rax
       xor      rax, rax
       mov      qword ptr [rsp], rax
       mov      qword ptr [rsp+18H], rdx

G_M42317_IG02:
       lea      rax, bword ptr [rsp]
       mov      rdx, rcx
       mulx     rdx, r8, qword ptr [rsp+18H]
       mov      qword ptr [rax], r8
       mov      rax, rdx
       add      rax, qword ptr [rsp]

G_M42317_IG03:
       add      rsp, 8
       ret      

; Total bytes of code 41, prolog size 7 for method DecCalc:mulx(long,long):long

While ideally this should be just:

       mulx     rax, rcx, rcx
       add      rax, rcx
       ret      

category:cq
theme:vector-codegen
skill-level:expert
cost:medium
impact:small

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions