[libclc] Optimize ceil/fabs/floor/rint/trunc #119596

frasercrmck · 2024-12-11T17:40:23Z

These functions all map to the corresponding LLVM intrinsics, but the vector intrinsics weren't being generated. The intrinsic mapping from CLC vector function to vector intrinsic was working correctly, but the mapping from OpenCL builtin to CLC function was suboptimally recursively splitting vectors in halves.

For example, with this change, ceil(float16) calls llvm.ceil.v16f32 directly once optimizations are applied.

Now also, instead of generating LLVM intrinsics through __asm we now call clang elementwise builtins for each CLC builtin. This should be a more standard way of achieving the same result

The CLC versions of each of these builtins are also now built and enabled for SPIR-V targets. The LLVM -> SPIR-V translator maps the intrinsics to the appropriate OpExtInst, so there should be no difference in semantics, despite the newly introduced indirection from OpenCL builtin through the CLC builtin to the intrinsic.

The AMDGPU targets make use of the same _CLC_DEFINE_UNARY_BUILTIN macro to override sqrt, so those functions also appear more optimal with this change, calling the vector llvm.sqrt.vXf32 intrinsics directly.

These functions all map to the corresponding LLVM intrinsics, but the vector intrinsics weren't being generated. The intrinsic mapping from CLC vector function to vector intrinsic was working correctly, but the mapping from OpenCL builtin to CLC function was suboptimally recursively splitting vectors in halves. For example, with this change, `ceil(float16)` calls `llvm.ceil.v16f32` directly. The CLC versions of each of these builtins are also now enabled for SPIR-V targets. The LLVM -> SPIR-V translator maps the intrinsics to the appropriate OpExtInst. As such, there is no diff to the SPIR-V binaries before/after this change. The clspv targets show a difference, but it's not expected to be a problem: > %call = tail call spir_func double @llvm.fabs.f64(double noundef %x) llvm#9 < %call = tail call spir_func double @_Z4fabsd(double noundef %x) llvm#9 The AMDGPU targets make use of the same _CLC_DEFINE_UNARY_BUILTIN macro to override sqrt, so those functions also appear more optimal with this change, calling the vector `llvm.sqrt.vXf32` intrinsics directly.

frasercrmck · 2024-12-11T17:41:01Z

CC @rjodinchr, @karolherbst

arsenm

LGTM. I'm not sure how this all ends up expanding, I was expecting to see the elementwise builtins used.

It would be great if we had update_cc_test_checks style testing for the resulting implementation

frasercrmck · 2024-12-12T11:35:57Z

LGTM. I'm not sure how this all ends up expanding, I was expecting to see the elementwise builtins used.

Yes, I suspect that this code originates from before the builtins were available? The builtins would probably make more sense, tbh. The current method is that we have the OpenCL builtin call the corresponding CLC builtin, which in its header uses this strange __asm__ method of calling LLVM intrinsics directly. It should maybe just do: OpenCL builtin -> CLC builtin -> clang builtin?

It would be great if we had update_cc_test_checks style testing for the resulting implementation

Oh yes, I agree. My efforts to introduce testing stalled somewhat. Maybe we can pick up that discussion on #87989?

arsenm · 2024-12-12T11:38:24Z

which in its header uses this strange asm method of calling LLVM intrinsics directly.

That's something that's always surprised me it works. It's rather unsafe (you can bypass immarg validation for instance). Plus asm callsites get infected with overly conservative attributes (like convergent, which you can't remove)

It should maybe just do: OpenCL builtin -> CLC builtin -> clang builtin?

That's the simplest way to go

frasercrmck · 2024-12-12T12:37:52Z

which in its header uses this strange asm method of calling LLVM intrinsics directly.

That's something that's always surprised me it works. It's rather unsafe (you can bypass immarg validation for instance). Plus asm callsites get infected with overly conservative attributes (like convergent, which you can't remove)

Yeah, good point.

It should maybe just do: OpenCL builtin -> CLC builtin -> clang builtin?

That's the simplest way to go

I've updated the patch to do just that, using the builtins. I'll update the description accordingly.

frasercrmck added the libclc libclc OpenCL library label Dec 11, 2024

frasercrmck requested a review from arsenm December 11, 2024 17:40

arsenm approved these changes Dec 12, 2024

View reviewed changes

move away from using intrinsics

c825dc0

arsenm approved these changes Dec 12, 2024

View reviewed changes

frasercrmck merged commit 06789cc into llvm:main Dec 13, 2024
8 checks passed

frasercrmck deleted the libclc-optimize-intrinsics branch December 13, 2024 08:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[libclc] Optimize ceil/fabs/floor/rint/trunc #119596

[libclc] Optimize ceil/fabs/floor/rint/trunc #119596

Uh oh!

frasercrmck commented Dec 11, 2024 •

edited

Loading

Uh oh!

frasercrmck commented Dec 11, 2024

Uh oh!

arsenm left a comment

Uh oh!

frasercrmck commented Dec 12, 2024

Uh oh!

arsenm commented Dec 12, 2024 •

edited

Loading

Uh oh!

frasercrmck commented Dec 12, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[libclc] Optimize ceil/fabs/floor/rint/trunc #119596

[libclc] Optimize ceil/fabs/floor/rint/trunc #119596

Uh oh!

Conversation

frasercrmck commented Dec 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

frasercrmck commented Dec 11, 2024

Uh oh!

arsenm left a comment

Choose a reason for hiding this comment

Uh oh!

frasercrmck commented Dec 12, 2024

Uh oh!

arsenm commented Dec 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

frasercrmck commented Dec 12, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

frasercrmck commented Dec 11, 2024 •

edited

Loading

arsenm commented Dec 12, 2024 •

edited

Loading