[pull] main from llvm:main #5643

pull · 2025-10-22T01:14:26Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

Without this patch, DenseMap and SmallDenseMap have distinct implementations of shrink_and_clear. These implementations mix a common high-level algorithm with class-specific logic. This patch moves the common algorithm into DenseMapBase::shrink_and_clear. A new private helper, planShrinkAndClear, now handles the class-specific logic for deciding whether to shrink the buffer. The base class method now serves as the single public entry point.

…165067) Based on some recent discussion in #162007. Documenting this in the best practices page so we have something easy to point to in code review/reference for ourselves now that the repository has been cleaned up.

…ates (#164848) Source: Hacker's delight.

…165111)

#165112) Update `.Cases` and `.CasesLower` with 4+ args to use the `initializer_list` overload. The deprecation of these functions will come in a separate PR. For more context, see: #163405.

Based on #165067 (comment).

Move the implementation of vputils::isSingleScalar to VPlanUtils.cpp to enable code sharing.

RadixTree.h does not use anything from <limits>.

This patch replaces "typedef" with "type alias" in the comment while making it more concise.

This patch simplifies construction of iterator_range<T> by using: iterator_range<T>(Container &&) instead of: iterator_range<T>(T begin_iterator, T end_iterator)

We can use brace initializer lists to simplify constructors.

#163933) …consistent Since the bindings now use nanobind, I changed the code examples and mentions in the documentation prose to mention nanobind concepts and symbols wherever applicable. I also made the spelling of "Python" consistent by choosing the uppercase name everywhere that's not an executable name, part of a URL, or directory name. ---------------- Note that I left mentions of `PybindAdaptors.h` in because of #162309. Are there any thoughts about adding a virtual environment setup guide using [uv](https://docs.astral.sh/uv/)? It has gotten pretty popular, and is much faster than a "vanilla" Python pip install. It can also bootstrap an interpreter not present on the user's machine, for example a free-threaded Python build, with the `-p` flag to the `uv venv` virtual environment creation command.

Suggest the `initializer_list` overload instead. 3+ args is an arbitrary number that allows for incremental depreciation without having to update too many call sites. For more context, see #163117.

…64793) When SWIG is installed but not any Lua interpreter, the cmake script in `lldb/cmake/modules/FindLuaAndSwig.cmake` will execute `find_program(LUA_EXECUTABLE, ...)` and this will set the `LUA_EXECUTABLE` variable to `LUA_EXECUTABLE-NOTFOUND`. Ensure that in this case we are skipping the Lua tests requiring the interpreter.

Post cleanup for #164534.

…bstract PyOpView (#165053) #157930 changed `nb::object getOwner()` to `PyOpView getOwner()` which implicitly constructs the generic OpView against from a (possibly) concrete OpView. This PR fixes that.

…164763) These issues affect only Debug builds, and Release builds with asserts enabled. 1. In `SparseTensor.h` a variable is moved-from within an assert, introducing a side effect that alters its subsequent use, and causes divergence between Debug and Release builds (with asserts disabled). 2. In `IterationGraphSorter.cpp`, the class constructor arguments are moved-from to initialize class member variables via the initializer list. Because both the arguments and class members are identically named, there's a naming collision where the arguments shadow their identically-named member variables counterparts inside the constructor body. In the original code, unqualified names inside the asserts, referred to the constructor arguments. This is wrong, because these have already been moved-from. It's not just a UB, but is broken. These SmallVector types when moved-from are reset i.e. the size resets to 0. This actually renders the affected asserts ineffective, since the comparisons operate on two hollowed-out objects and always succeed. This name ambiguity is fixed by using 'this->' to correctly refer to the initialized member variables carrying the relevant state. 3. While the fix 2 above made the asserts act as intended, it also unexpectedly broke one mlir test: `llvm-lit -v mlir/test/Dialect/SparseTensor/sparse_scalars.mlir` This required fixing the assert logic itself, which likely has never worked and went unnoticed all this time due to the bug 2. Specifically, in the failing test that uses `mlir/test/Dialect/SparseTensor/sparse_scalars.mlir` the '%argq' of 'ins' is defined as 'f32' scalar type, but the original code inside the assert had no support for scalar types as written, and was breaking the test. Testing: ``` ninja check-mlir llvm-lit -v mlir/test/Dialect/SparseTensor/sparse_scalars.mlir ```

These were disabled when adjusting tests to work with the internal shell because the implementation on these systems of env did not support the -u option. Now that we have switched to the internal shell and env -u is implemented internally, these tests should work again.

…NFC (#165121) - On targets that don't require the Triple, don't pass it. - Use `.value_or` to where possible.

…#164694) This reverts commit f1c1063. PR #163453 was merged and reverted since it exposed a crash. After investigation the crash was unrelated and is then fixed in #164628. This is an attempt to reland #163453.

…#165065) CreatePastEnd parameter had no effect on the label creation. Remove it.

… at the end of the line. (#165129) If we have a target where both # and ## are valid comment strings, a line ending in # would trigger the lexer to eat 2 characters and therefore lex the _next_ line as a comment. Oops. This was introduced in 4946db1 rdar://162635338

…cl through parse (#164778) Instead of manually creating and adding a PTU, we should be able to use `RegisterPTU` which does the same job here.

Recently switched jobs. In practice this doesn't change much since I'm still in the security group to represent Rust, but I'm updating the actual company I work for to keep the list up to date.

…pVF (#156723) Transform TC and VF to same numerical space when they are different.

A conventional "if" statement is easier to read than the do-while(false) pattern used here.

The "if" statement being removed in this patch is identical to the "else" clause.

1. createHvxPrefixPred was computing an invalid byte count for small predicate types, leading to a crash during instruction selection. 2. HexagonTargetLowering::SplitHvxMemOp assumed the memory vector type is always simple. This patch adds a guard to avoid processing non-simple vector types, which can lead to failure. Patch By: Fateme Hosseini Co-authored-by: pavani karveti <[email protected]> Co-authored-by: Sergei Larin <[email protected]> Co-authored-by: Pavani Karveti <[email protected]>

Part of #102817. This patch optimizes `rng::generate_n` for segmented iterators by forwarding the implementation directly to `std::generate_n`. - before ``` rng::generate_n(deque<int>)/32 21.7 ns 22.0 ns 32000000 rng::generate_n(deque<int>)/50 30.8 ns 30.7 ns 22400000 rng::generate_n(deque<int>)/1024 492 ns 488 ns 1120000 rng::generate_n(deque<int>)/8192 3938 ns 3924 ns 179200 ``` - after ``` rng::generate_n(deque<int>)/32 11.0 ns 11.0 ns 64000000 rng::generate_n(deque<int>)/50 16.2 ns 16.1 ns 40727273 rng::generate_n(deque<int>)/1024 292 ns 286 ns 2240000 rng::generate_n(deque<int>)/8192 2291 ns 2302 ns 298667 ```

…r switch lowering (#155910) Currently it is considered suitable to lower to a bit test for a set of switch case clusters when the the number of unique destinations (`NumDests`) and the number of total comparisons (`NumCmps`) satisfy: `(NumDests == 1 && NumCmps >= 3) || (NumDests == 2 && NumCmps >= 5) || (NumDests == 3 && NumCmps >= 6)` However it is found for some cases on powerpc, for example, when NumDests is 3, and the number of comparisons for each destination is all 2, it's not profitable to lower the switch to bit test. This is to add an option to set the minimum of largest number of comparisons to use bit test for switch lowering. --------- Co-authored-by: Shimin Cui <[email protected]>

…#165371) We may need to load ZT0 after the call, so we can't perform a tail call.

This test fails on some arm64 macOS runs currently. This patch bumps up the number of runs by 10x to hopefully get it passing consistently. rdar://162122184

This test is now XPASSing due to a linker update on the platform. This patch removes the XFAIL from the test. rdar://163149345

…5417) Skip the test for Windows hosts. This patch fixes the buildbot `lldb-remote-linux-win`. https://lab.llvm.org/buildbot/#/builders/197/builds/10304

…token() (#156842) Implement code generation for `__builtin_infer_alloc_token()`. The `AllocToken` pass is now registered to run unconditionally in the optimization pipeline. This ensures that all instances of the `llvm.alloc.token.id` intrinsic are lowered to constant token IDs, regardless of whether `-fsanitize=alloc-token` is enabled. This guarantees that the builtin always resolves to a token value, providing a consistent and reliable mechanism for compile-time token querying. This completes `__builtin_infer_alloc_token(<malloc-args>, ...)` to allow compile-time querying of the token ID, where the builtin arguments mirror those normally passed to any allocation function. The argument expressions are unevaluated operands. For type-based token modes, the same type inference logic is used as for untyped allocation calls. For example the ID that is passed to (with `-fsanitize=alloc-token`): some_malloc(sizeof(Type), ...) is equivalent to the token ID returned by __builtin_infer_alloc_token(sizeof(Type), ...) The builtin provides a mechanism to pass or compare token IDs in code that needs to be explicitly allocation token-aware (such as inside an allocator, or through wrapper macros). A more concrete demonstration of __builtin_infer_alloc_token's use is enabling type-aware Slab allocations in the Linux kernel: https://lore.kernel.org/all/[email protected]/ Notably, any kind of allocation-call rewriting is a poor fit for the Linux kernel's kmalloc-family functions, which are macros that wrap (multiple) layers of inline and non-inline wrapper functions. Given the Linux kernel defines its own allocation APIs, the more explicit builtin gives the right level of control over where the type inference happens and the resulting token is passed.

…ectors.

…valuation (#164026) Enables constexpr evaluation for the following AVX512 Integer Comparison Intrinsics: ``` _mm_cmp_epi8_mask _mm_cmp_epu8_mask _mm_cmp_epi16_mask _mm_cmp_epu16_mask _mm_cmp_epi32_mask _mm_cmp_epu32_mask _mm_cmp_epi64_mask _mm_cmp_epu64_mask _mm256_cmp_epi8_mask _mm256_cmp_epu8_mask _mm256_cmp_epi16_mask _mm256_cmp_epu16_mask _mm256_cmp_epi32_mask _mm256_cmp_epu32_mask _mm256_cmp_epi64_mask _mm256_cmp_epu64_mask _mm512_cmp_epi8_mask _mm512_cmp_epu8_mask _mm512_cmp_epi16_mask _mm512_cmp_epu16_mask _mm512_cmp_epi32_mask _mm512_cmp_epu32_mask _mm512_cmp_epi64_mask _mm512_cmp_epu64_mask ``` Part 1 of #162054

Upstream try block with only noexcept calls inside, which doesn't need to be converted to TryCallOp Issue #154992

Update `amdgpu.wmma` op definition and implement amdgpu to rocdl conversion for new variants.

This is still somehow a WIP, we have some issues with this interface that are not trivial to solve. This patch tries to make the concepts of RegionBranchPoint and RegionSuccessor more robust and aligned with their definition: - A `RegionBranchPoint` is either the parent (`RegionBranchOpInterface`) op or a `RegionBranchTerminatorOpInterface` operation in a nested region. - A `RegionSuccessor` is either one of the nested region or the parent `RegionBranchOpInterface` Some new methods with reasonnable default implementation are added to help resolving the flow of values across the RegionBranchOpInterface. It is still not trivial in the current state to walk the def-use chain backward with this interface. For example when you have the 3rd block argument in the entry block of a for-loop, finding the matching operands requires to know about the hidden loop iterator block argument and where the iterargs start. The API is designed around forward-tracking of the chain unfortunately. Try to reland #161575 ; I suspect a buildbot incremental build issue.

Add new instruction `mtlpl`.

A reduction (including partial reductions) with a multiply of a constant value can be bundled by first converting it from `reduce.add(mul(ext, const))` to `reduce.add(mul(ext, ext(const)))` as long as it is safe to extend the constant. This PR adds such bundling by first truncating the constant to the source type of the other extend, then extending it to the destination type of the extend. The first truncate is necessary so that the types of each extend's operand are then the same, and the call to canConstantBeExtended proves that the extend following a truncate is safe to do. The truncate is removed by optimisations. This is a stacked PR, 1a and 1b can be merged in any order: 1a. #147302 1b. #163175 2. -> #162503

…), amt)) -> (load p + amt/8) fold (#165436) The pointer adjustment no longer guarantees any alignment Missed in #165266 and only noticed in some follow up work

With Xqcili, `c.li` may be relaxed to `qc.e.li` (this is because `qc.e.li` is compressed into `c.li`, which needs to be undone). `qc.e.li` is relaxable, so we need to mark `c.li` as linker relaxable when it is emitted. This fixup cannot be emitted as a relocation, but we still mark it as requiring no R_RISCV_RELAX in case this changes in the future.

#165316) As described in the programming guide: https://docs.nvidia.com/hpc-sdk/compilers/cuda-fortran-prog-guide/#load-and-store-functions-using-bulk-tma-operations

Consider OpenMP stylized expression to be a template to be instantiated with a series of types listed on the containing directive (currently DECLARE_REDUCTION). Create a series of instantiations in the parser, allowing OpenMP special variables to be declared separately for each type. --------- Co-authored-by: Tom Eccles <[email protected]>

Adds the WaveActiveMin intrinsic from #99169. I think I did all of the required things on the checklist: - [x] Implement `WaveActiveMin` clang builtin, - [x] Link `WaveActiveMin` clang builtin with `hlsl_intrinsics.h` - [x] Add sema checks for `WaveActiveMin` to `CheckHLSLBuiltinFunctionCall` in `SemaChecking.cpp` - [x] Add codegen for `WaveActiveMin` to `EmitHLSLBuiltinExpr` in `CGBuiltin.cpp` - [x] Add codegen tests to `clang/test/CodeGenHLSL/builtins/WaveActiveMin.hlsl` - [x] Add sema tests to `clang/test/SemaHLSL/BuiltIns/WaveActiveMin-errors.hlsl` - [x] Create the `int_dx_WaveActiveMin` intrinsic in `IntrinsicsDirectX.td` - [x] Create the `DXILOpMapping` of `int_dx_WaveActiveMin` to `119` in `DXIL.td` - [x] Create the `WaveActiveMin.ll` and `WaveActiveMin_errors.ll` tests in `llvm/test/CodeGen/DirectX/` - [x] Create the `int_spv_WaveActiveMin` intrinsic in `IntrinsicsSPIRV.td` - [x] In SPIRVInstructionSelector.cpp create the `WaveActiveMin` lowering and map it to `int_spv_WaveActiveMin` in `SPIRVInstructionSelector::selectIntrinsic`. - [x] Create SPIR-V backend test case in `llvm/test/CodeGen/SPIRV/hlsl-intrinsics/WaveActiveMin.ll But as some of the code has changed and was moved around (E.G. `CGBuiltin.cpp` -> `CGHLSLBuiltins.cpp`) I mostly followed how `WaveActiveMax()` is implemented. I have not been able to run the tests myself as I am unsure which project runs the correct test. Any guidance on how I can test myself would be helpful. Also added some tests to the offload-test-suite llvm/offload-test-suite#478

Need to re-check the instruction with the non-schedulable parent, only if this parent has a user phi node (i.e. it is used only outside the block) and the user instruction has unique parent instruction. Fixes issue reported in 20675ee#commitcomment-168863594

) Fix building ClangIR after RegionBranchOpInterface revamp (#165429)

In 9865171, a file named aarch64-mlr-for-calls-only.c was added to clang/include/clang/Driver. This file contains only llvm-lit directives. The file has been moved to clang/test/Driver where it ought to reside.

pull bot locked and limited conversation to collaborators Oct 22, 2025

pull bot added the ⤵️ pull label Oct 22, 2025

kazutakahirata and others added 28 commits October 25, 2025 10:05

[DAGCombine] Improve bswap lowering for machines that support bit rot…

5d0f159

…ates (#164848) Source: Hacker's delight.

[BOLT] Avoid extra function dump on invalid BBs found by UCE (NFC) (#…

b35c93f

…165111)

[ADT] Prepare for deprecation of StringSwitch cases with 3+ args. NFC. (

57828a6

#165112) Update `.Cases` and `.CasesLower` with 4+ args to use the `initializer_list` overload. The deprecation of these functions will come in a separate PR. For more context, see: #163405.

[Docs] Add CIBestPractices docs link to Reference.rst (#165108)

d748a12

Based on #165067 (comment).

[VPlan] Move isSingleScalar implementation to VPlanUtils.cpp (NFC)

d020b2d

Move the implementation of vputils::isSingleScalar to VPlanUtils.cpp to enable code sharing.

[compiler-rt][libunwind] Allow for CET on OpenBSD (#164341)

f03ccef

[ADT] Remove #include <limits> in RadixTree.h (NFC) (#165115)

5a6c236

RadixTree.h does not use anything from <limits>.

[ADT] Fix a comment in ScopedHashTable (#165116)

5d23610

This patch replaces "typedef" with "type alias" in the comment while making it more concise.

[llvm] Use iterator_range<T>(Container &&) (NFC) (#165117)

378d5ea

This patch simplifies construction of iterator_range<T> by using: iterator_range<T>(Container &&) instead of: iterator_range<T>(T begin_iterator, T end_iterator)

[Support] Modernize Uint24 in DataExtractor.h (NFC) (#165118)

e219cf6

We can use brace initializer lists to simplify constructors.

[ADT] Deprecate StringSwitch Cases with 3+ args. NFC. (#165119)

3526bb0

Suggest the `initializer_list` overload instead. 3+ args is an arbitrary number that allows for incremental depreciation without having to update too many call sites. For more context, see #163117.

[test][PowerPC] Remove unsafe-fp-math uses (NFC) (#164817)

c8f5c60

Post cleanup for #164534.

[MLIR][Python] fix getOwner to return (typed) nb::object instead of a…

c05ce9b

…bstract PyOpView (#165053) #157930 changed `nb::object getOwner()` to `PyOpView getOwner()` which implicitly constructs the generic OpView against from a (possibly) concrete OpView. This PR fixes that.

[llvm] Make getEffectiveRelocModel helper consistent across targets. …

7ebc3db

…NFC (#165121) - On targets that don't require the Triple, don't pass it. - Use `.value_or` to where possible.

[BOLT] Remove CreatePastEnd parameter in getOrCreateLocalLabel(). NFC (…

cd27741

…#165065) CreatePastEnd parameter had no effect on the label creation. Remove it.

[clang-repl] Use RegisterPTU for tracking generated TranslationUnitDe…

ff48353

…cl through parse (#164778) Instead of manually creating and adding a PTU, we should be able to use `RegisterPTU` which does the same job here.

Update company affiliation (#165003)

63b83ea

Recently switched jobs. In practice this doesn't change much since I'm still in the security group to represent Rust, but I'm updating the actual company I work for to keep the list up to date.

[LV]: Improve accuracy of calculating remaining iterations of MainLoo…

be29f0d

…pVF (#156723) Transform TC and VF to same numerical space when they are different.

[ADT] Simplify control flow in ImmutableSet (NFC) (#165133)

3ebc935

A conventional "if" statement is easier to read than the do-while(false) pattern used here.

[Support] Simplify control flow in percentDecode (NFC) (#165134)

b153e01

The "if" statement being removed in this patch is identical to the "else" clause.

fhossein-quic and others added 29 commits October 28, 2025 09:20

Add switch_case.test to profcheck-xfail.txt (#165407)

a4950c4

[AArch64][SME] Disable tail calls for callees that require saving ZT0 (…

bfb54e8

…#165371) We may need to load ZT0 after the call, so we can't perform a tail call.

[Fuzzer][Test-Only] Increase runs for reduce-inputs.test (#165402)

2aea02d

This test fails on some arm64 macOS runs currently. This patch bumps up the number of runs by 10x to hopefully get it passing consistently. rdar://162122184

[Fuzzer][Test-Only] Re-enable fuzzer-ubsan.test on Darwin (#165403)

3172970

This test is now XPASSing due to a linker update on the platform. This patch removes the XFAIL from the test. rdar://163149345

DAG: Consider __sincos_stret when deciding to form fsincos (#165169)

28e9a28

[lldb] The test added for PR#164905 doesn't run on Windows host. (#16…

624d4f6

…5417) Skip the test for Windows hosts. This patch fixes the buildbot `lldb-remote-linux-win`. https://lab.llvm.org/buildbot/#/builders/197/builds/10304

Extend vector reduction constants folding tests to include scalable v…

16f61ac

…ectors.

[CIR] Upstream Try block with only noexcept calls (#165153)

7164544

Upstream try block with only noexcept calls inside, which doesn't need to be converted to TryCallOp Issue #154992

[MemRef] Implement value bounds interface for CollapseShapeOp (#164955)

6ad9565

[mlir][amdgpu][rocdl] Add gfx1250 wmma ops (#165064)

466c526

Update `amdgpu.wmma` op definition and implement amdgpu to rocdl conversion for new variants.

[PowerPC] Implement Context Switch Instr mtlpl (#160593)

e5668d3

Add new instruction `mtlpl`.

[MLIR] Fix some typos in AffineOps.td (NFC)

87f9e1b

[X86] combineTruncate - drop load alignment after (trunc (srl (load p…

af110e1

…), amt)) -> (load p + amt/8) fold (#165436) The pointer adjustment no longer guarantees any alignment Missed in #165266 and only noticed in some follow up work

[bazel][mlir] Port #165429: RegionBranchOpInterface (#165447)

8895386

[flang][cuda] Add interfaces and lowering for barrier_try_wait(_sleep) (

56c1d35

#165316) As described in the programming guide: https://docs.nvidia.com/hpc-sdk/compilers/cuda-fortran-prog-guide/#load-and-store-functions-using-bulk-tma-operations

[CIR] Fix building ClangIR after RegionBranchOpInterface revamp (#165441

d0e0d7f

) Fix building ClangIR after RegionBranchOpInterface revamp (#165429)

[CI] fix typo in code-format job (#165461)

a449c34

[clang][Driver] Move test out of clang/include

a00bb9c

In 9865171, a file named aarch64-mlr-for-calls-only.c was added to clang/include/clang/Driver. This file contains only llvm-lit directives. The file has been moved to clang/test/Driver where it ought to reside.

pull bot merged commit a00bb9c into Ericsson:main Oct 28, 2025
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[pull] main from llvm:main #5643

[pull] main from llvm:main #5643

Uh oh!

pull bot commented Oct 22, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

127 participants

[pull] main from llvm:main #5643

[pull] main from llvm:main #5643

Uh oh!

Conversation

pull bot commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

127 participants

pull bot commented Oct 22, 2025 •

edited

Loading