Skip to content

Conversation

@pull
Copy link

@pull pull bot commented Oct 22, 2025

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

@pull pull bot locked and limited conversation to collaborators Oct 22, 2025
@pull pull bot added the ⤵️ pull label Oct 22, 2025
kazutakahirata and others added 28 commits October 25, 2025 10:05
Without this patch, DenseMap and SmallDenseMap have distinct
implementations of shrink_and_clear.  These implementations mix a
common high-level algorithm with class-specific logic.

This patch moves the common algorithm into
DenseMapBase::shrink_and_clear.  A new private helper,
planShrinkAndClear, now handles the class-specific logic for deciding
whether to shrink the buffer.  The base class method now serves as the
single public entry point.
…165067)

Based on some recent discussion in #162007. Documenting this in the best
practices page so we have something easy to point to in code
review/reference for ourselves now that the repository has been cleaned
up.
#165112)

Update `.Cases` and `.CasesLower` with 4+ args to use the
`initializer_list` overload. The deprecation of these functions will
come in a separate PR.

For more context, see: #163405.
Move the implementation of vputils::isSingleScalar to VPlanUtils.cpp to
enable code sharing.
RadixTree.h does not use anything from <limits>.
This patch replaces "typedef" with "type alias" in the comment while
making it more concise.
This patch simplifies construction of iterator_range<T> by using:

  iterator_range<T>(Container &&)

instead of:

  iterator_range<T>(T begin_iterator, T end_iterator)
We can use brace initializer lists to simplify constructors.
#163933)

…consistent

Since the bindings now use nanobind, I changed the code examples and
mentions in the documentation prose to mention nanobind concepts and
symbols wherever applicable.

I also made the spelling of "Python" consistent by choosing the
uppercase name everywhere that's not an executable name, part of a URL,
or directory name.

----------------

Note that I left mentions of `PybindAdaptors.h` in because of
#162309.

Are there any thoughts about adding a virtual environment setup guide
using [uv](https://docs.astral.sh/uv/)? It has gotten pretty popular,
and is much faster than a "vanilla" Python pip install. It can also
bootstrap an interpreter not present on the user's machine, for example
a free-threaded Python build, with the `-p` flag to the `uv venv`
virtual environment creation command.
Suggest the `initializer_list` overload instead.

3+ args is an arbitrary number that allows for incremental depreciation
without having to update too many call sites.

For more context, see #163117.
…64793)

When SWIG is installed but not any Lua interpreter, the cmake script in
`lldb/cmake/modules/FindLuaAndSwig.cmake` will execute
`find_program(LUA_EXECUTABLE, ...)` and this will set the
`LUA_EXECUTABLE` variable to `LUA_EXECUTABLE-NOTFOUND`.

Ensure that in this case we are skipping the Lua tests requiring the
interpreter.
…bstract PyOpView (#165053)

#157930 changed `nb::object
getOwner()` to `PyOpView getOwner()` which implicitly constructs the
generic OpView against from a (possibly) concrete OpView. This PR fixes
that.
…164763)

These issues affect only Debug builds, and Release builds with asserts
enabled.

1. In `SparseTensor.h` a variable is moved-from within an assert,
introducing a side effect that alters its subsequent use, and causes
divergence between Debug and Release builds (with asserts disabled).

2. In `IterationGraphSorter.cpp`, the class constructor arguments are
moved-from to initialize class member variables via the initializer
list. Because both the arguments and class members are identically
named, there's a naming collision where the arguments shadow their
identically-named member variables counterparts inside the constructor
body. In the original code, unqualified names inside the asserts,
referred to the constructor arguments. This is wrong, because these have
already been moved-from. It's not just a UB, but is broken. These
SmallVector types when moved-from are reset i.e. the size resets to 0.
This actually renders the affected asserts ineffective, since the
comparisons operate on two hollowed-out objects and always succeed. This
name ambiguity is fixed by using 'this->' to correctly refer to the
initialized member variables carrying the relevant state.

3. While the fix 2 above made the asserts act as intended, it also
unexpectedly broke one mlir test: `llvm-lit -v
mlir/test/Dialect/SparseTensor/sparse_scalars.mlir` This required fixing
the assert logic itself, which likely has never worked and went
unnoticed all this time due to the bug 2. Specifically, in the failing
test that uses `mlir/test/Dialect/SparseTensor/sparse_scalars.mlir` the
'%argq' of 'ins' is defined as 'f32' scalar type, but the original code
inside the assert had no support for scalar types as written, and was
breaking the test.

Testing:
```
ninja check-mlir
llvm-lit -v mlir/test/Dialect/SparseTensor/sparse_scalars.mlir
```
These were disabled when adjusting tests to work with the internal shell
because the implementation on these systems of env did not support the
-u option. Now that we have switched to the internal shell and env -u is
implemented internally, these tests should work again.
…NFC (#165121)

- On targets that don't require the Triple, don't pass it.
- Use `.value_or` to where possible.
…#164694)

This reverts commit f1c1063.

PR #163453 was merged and reverted since it exposed a crash. 
After investigation the crash was unrelated and is then fixed in #164628.

This is an attempt to reland #163453.
…#165065)

CreatePastEnd parameter had no effect on the label creation. Remove it.
… at the end of the line. (#165129)

If we have a target where both # and ## are valid comment strings,
a line ending in # would trigger the lexer to eat 2 characters
and therefore lex the _next_ line as a comment. Oops. This was introduced
in 4946db1

rdar://162635338
…cl through parse (#164778)

Instead of manually creating and adding a PTU, we should be able to use
`RegisterPTU` which does the same job here.
Recently switched jobs. In practice this doesn't change much since I'm
still in the security group to represent Rust, but I'm updating the
actual company I work for to keep the list up to date.
…pVF (#156723)

Transform TC and VF to same numerical space when they are different.
A conventional "if" statement is easier to read than the
do-while(false) pattern used here.
The "if" statement being removed in this patch is identical to the
"else" clause.
fhossein-quic and others added 29 commits October 28, 2025 09:20
1. createHvxPrefixPred was computing an invalid byte count for small
predicate types, leading to a crash during instruction selection.
2. HexagonTargetLowering::SplitHvxMemOp assumed the memory vector type
is always simple. This patch adds a guard to avoid processing non-simple
vector types, which can lead to failure.

Patch By:
Fateme Hosseini

Co-authored-by: pavani karveti <[email protected]>
Co-authored-by: Sergei Larin <[email protected]>
Co-authored-by: Pavani Karveti <[email protected]>
Part of #102817.

This patch optimizes `rng::generate_n` for segmented iterators by
forwarding the implementation directly to `std::generate_n`.

- before

```
rng::generate_n(deque<int>)/32          21.7 ns         22.0 ns     32000000
rng::generate_n(deque<int>)/50          30.8 ns         30.7 ns     22400000
rng::generate_n(deque<int>)/1024         492 ns          488 ns      1120000
rng::generate_n(deque<int>)/8192        3938 ns         3924 ns       179200
```

- after

```
rng::generate_n(deque<int>)/32          11.0 ns         11.0 ns     64000000
rng::generate_n(deque<int>)/50          16.2 ns         16.1 ns     40727273
rng::generate_n(deque<int>)/1024         292 ns          286 ns      2240000
rng::generate_n(deque<int>)/8192        2291 ns         2302 ns       298667
```
…r switch lowering (#155910)

Currently it is considered suitable to lower to a bit test for a set of
switch case clusters when the the number of unique destinations
(`NumDests`) and the number of total comparisons (`NumCmps`) satisfy:
`(NumDests == 1 && NumCmps >= 3) || (NumDests == 2 && NumCmps >= 5) ||
(NumDests == 3 && NumCmps >= 6)`

However it is found for some cases on powerpc, for example, when
NumDests is 3, and the number of comparisons for each destination is all
2, it's not profitable to lower the switch to bit test. This is to add
an option to set the minimum of largest number of comparisons to use bit
test for switch lowering.

---------

Co-authored-by: Shimin Cui <[email protected]>
…#165371)

We may need to load ZT0 after the call, so we can't perform a tail call.
This test fails on some arm64 macOS runs currently.

This patch bumps up the number of runs by 10x to hopefully get it
passing consistently.

rdar://162122184
This test is now XPASSing due to a linker update on the platform.

This patch removes the XFAIL from the test.

rdar://163149345
…5417)

Skip the test for Windows hosts.
This patch fixes the buildbot `lldb-remote-linux-win`. 
https://lab.llvm.org/buildbot/#/builders/197/builds/10304
…token() (#156842)

Implement code generation for `__builtin_infer_alloc_token()`. The
`AllocToken` pass is now registered to run unconditionally in the
optimization pipeline.  This ensures that all instances of the
`llvm.alloc.token.id` intrinsic are lowered to constant token IDs,
regardless of whether `-fsanitize=alloc-token` is enabled. This
guarantees that the builtin always resolves to a token value, providing
a consistent and reliable mechanism for compile-time token querying.

This completes `__builtin_infer_alloc_token(<malloc-args>, ...)` to
allow compile-time querying of the token ID, where the builtin arguments
mirror those normally passed to any allocation function. The argument
expressions are unevaluated operands. For type-based token modes, the
same type inference logic is used as for untyped allocation calls.

For example the ID that is passed to (with `-fsanitize=alloc-token`):

    some_malloc(sizeof(Type), ...)

is equivalent to the token ID returned by

    __builtin_infer_alloc_token(sizeof(Type), ...)

The builtin provides a mechanism to pass or compare token IDs in code
that needs to be explicitly allocation token-aware (such as inside an
allocator, or through wrapper macros).

A more concrete demonstration of __builtin_infer_alloc_token's use is
enabling type-aware Slab allocations in the Linux kernel:

  https://lore.kernel.org/all/[email protected]/

Notably, any kind of allocation-call rewriting is a poor fit for the
Linux kernel's kmalloc-family functions, which are macros that wrap
(multiple) layers of inline and non-inline wrapper functions. Given the
Linux kernel defines its own allocation APIs, the more explicit builtin
gives the right level of control over where the type inference happens
and the resulting token is passed.
…valuation (#164026)

Enables constexpr evaluation for the following AVX512 Integer Comparison Intrinsics:
```
_mm_cmp_epi8_mask _mm_cmp_epu8_mask
_mm_cmp_epi16_mask _mm_cmp_epu16_mask
_mm_cmp_epi32_mask _mm_cmp_epu32_mask
_mm_cmp_epi64_mask _mm_cmp_epu64_mask

_mm256_cmp_epi8_mask _mm256_cmp_epu8_mask
_mm256_cmp_epi16_mask _mm256_cmp_epu16_mask
_mm256_cmp_epi32_mask _mm256_cmp_epu32_mask
_mm256_cmp_epi64_mask _mm256_cmp_epu64_mask

_mm512_cmp_epi8_mask _mm512_cmp_epu8_mask
_mm512_cmp_epi16_mask _mm512_cmp_epu16_mask
_mm512_cmp_epi32_mask _mm512_cmp_epu32_mask
_mm512_cmp_epi64_mask _mm512_cmp_epu64_mask
```
Part 1 of #162054
Upstream try block with only noexcept calls inside, which doesn't need
to be converted to TryCallOp

Issue #154992
Update `amdgpu.wmma` op definition and implement amdgpu to rocdl
conversion for new variants.
This is still somehow a WIP, we have some issues with this interface
that are not trivial to solve. This patch tries to make the concepts of
RegionBranchPoint and RegionSuccessor more robust and aligned with their
definition:
- A `RegionBranchPoint` is either the parent (`RegionBranchOpInterface`)
op or a `RegionBranchTerminatorOpInterface` operation in a nested
region.
- A `RegionSuccessor` is either one of the nested region or the parent
`RegionBranchOpInterface`

Some new methods with reasonnable default implementation are added to
help resolving the flow of values across the RegionBranchOpInterface.

It is still not trivial in the current state to walk the def-use chain
backward with this interface. For example when you have the 3rd block
argument in the entry block of a for-loop, finding the matching operands
requires to know about the hidden loop iterator block argument and where
the iterargs start. The API is designed around forward-tracking of the
chain unfortunately.

Try to reland #161575 ; I suspect a buildbot incremental build issue.
A reduction (including partial reductions) with a multiply of a constant
value can be bundled by first converting it from `reduce.add(mul(ext,
const))` to `reduce.add(mul(ext, ext(const)))` as long as it is safe to
extend the constant.

This PR adds such bundling by first truncating the constant to the
source type of the other extend, then extending it to the destination
type of the extend. The first truncate is necessary so that the types of
each extend's operand are then the same, and the call to
canConstantBeExtended proves that the extend following a truncate is
safe to do. The truncate is removed by optimisations.

This is a stacked PR, 1a and 1b can be merged in any order:
1a. #147302
1b. #163175
2. -> #162503
…), amt)) -> (load p + amt/8) fold (#165436)

The pointer adjustment no longer guarantees any alignment

Missed in #165266 and only noticed in some follow up work
With Xqcili, `c.li` may be relaxed to `qc.e.li` (this is because
`qc.e.li` is compressed into `c.li`, which needs to be undone).
`qc.e.li` is relaxable, so we need to mark `c.li` as linker relaxable
when it is emitted.

This fixup cannot be emitted as a relocation, but we still mark it as
requiring no R_RISCV_RELAX in case this changes in the future.
Consider OpenMP stylized expression to be a template to be instantiated
with a series of types listed on the containing directive (currently
DECLARE_REDUCTION). Create a series of instantiations in the parser,
allowing OpenMP special variables to be declared separately for each
type.

---------

Co-authored-by: Tom Eccles <[email protected]>
Adds the WaveActiveMin intrinsic from #99169. I think I did all of the
required things on the checklist:
- [x]  Implement `WaveActiveMin` clang builtin,
- [x]  Link `WaveActiveMin` clang builtin with `hlsl_intrinsics.h`
- [x] Add sema checks for `WaveActiveMin` to
`CheckHLSLBuiltinFunctionCall` in `SemaChecking.cpp`
- [x] Add codegen for `WaveActiveMin` to `EmitHLSLBuiltinExpr` in
`CGBuiltin.cpp`
- [x] Add codegen tests to
`clang/test/CodeGenHLSL/builtins/WaveActiveMin.hlsl`
- [x] Add sema tests to
`clang/test/SemaHLSL/BuiltIns/WaveActiveMin-errors.hlsl`
- [x] Create the `int_dx_WaveActiveMin` intrinsic in
`IntrinsicsDirectX.td`
- [x] Create the `DXILOpMapping` of `int_dx_WaveActiveMin` to `119` in
`DXIL.td`
- [x] Create the `WaveActiveMin.ll` and `WaveActiveMin_errors.ll` tests
in `llvm/test/CodeGen/DirectX/`
- [x] Create the `int_spv_WaveActiveMin` intrinsic in
`IntrinsicsSPIRV.td`
- [x] In SPIRVInstructionSelector.cpp create the `WaveActiveMin`
lowering and map it to `int_spv_WaveActiveMin` in
`SPIRVInstructionSelector::selectIntrinsic`.
- [x] Create SPIR-V backend test case in
`llvm/test/CodeGen/SPIRV/hlsl-intrinsics/WaveActiveMin.ll

But as some of the code has changed and was moved around (E.G.
`CGBuiltin.cpp` -> `CGHLSLBuiltins.cpp`) I mostly followed how
`WaveActiveMax()` is implemented.

I have not been able to run the tests myself as I am unsure which
project runs the correct test. Any guidance on how I can test myself
would be helpful.

Also added some tests to the offload-test-suite
llvm/offload-test-suite#478
Need to re-check the instruction with the non-schedulable parent, only
if this parent has a user phi node (i.e. it is used only outside the
  block) and the user instruction has unique parent instruction.

Fixes issue reported in 20675ee#commitcomment-168863594
)

Fix building ClangIR after RegionBranchOpInterface revamp (#165429)
In 9865171, a file named aarch64-mlr-for-calls-only.c was added to
clang/include/clang/Driver. This file contains only llvm-lit directives.
The file has been moved to clang/test/Driver where it ought to reside.
@pull pull bot merged commit a00bb9c into Ericsson:main Oct 28, 2025
2 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.