forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 2
[pull] main from llvm:main #5643
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Without this patch, DenseMap and SmallDenseMap have distinct implementations of shrink_and_clear. These implementations mix a common high-level algorithm with class-specific logic. This patch moves the common algorithm into DenseMapBase::shrink_and_clear. A new private helper, planShrinkAndClear, now handles the class-specific logic for deciding whether to shrink the buffer. The base class method now serves as the single public entry point.
…ates (#164848) Source: Hacker's delight.
Move the implementation of vputils::isSingleScalar to VPlanUtils.cpp to enable code sharing.
RadixTree.h does not use anything from <limits>.
This patch replaces "typedef" with "type alias" in the comment while making it more concise.
This patch simplifies construction of iterator_range<T> by using: iterator_range<T>(Container &&) instead of: iterator_range<T>(T begin_iterator, T end_iterator)
We can use brace initializer lists to simplify constructors.
#163933) …consistent Since the bindings now use nanobind, I changed the code examples and mentions in the documentation prose to mention nanobind concepts and symbols wherever applicable. I also made the spelling of "Python" consistent by choosing the uppercase name everywhere that's not an executable name, part of a URL, or directory name. ---------------- Note that I left mentions of `PybindAdaptors.h` in because of #162309. Are there any thoughts about adding a virtual environment setup guide using [uv](https://docs.astral.sh/uv/)? It has gotten pretty popular, and is much faster than a "vanilla" Python pip install. It can also bootstrap an interpreter not present on the user's machine, for example a free-threaded Python build, with the `-p` flag to the `uv venv` virtual environment creation command.
Suggest the `initializer_list` overload instead. 3+ args is an arbitrary number that allows for incremental depreciation without having to update too many call sites. For more context, see #163117.
…64793) When SWIG is installed but not any Lua interpreter, the cmake script in `lldb/cmake/modules/FindLuaAndSwig.cmake` will execute `find_program(LUA_EXECUTABLE, ...)` and this will set the `LUA_EXECUTABLE` variable to `LUA_EXECUTABLE-NOTFOUND`. Ensure that in this case we are skipping the Lua tests requiring the interpreter.
Post cleanup for #164534.
…164763) These issues affect only Debug builds, and Release builds with asserts enabled. 1. In `SparseTensor.h` a variable is moved-from within an assert, introducing a side effect that alters its subsequent use, and causes divergence between Debug and Release builds (with asserts disabled). 2. In `IterationGraphSorter.cpp`, the class constructor arguments are moved-from to initialize class member variables via the initializer list. Because both the arguments and class members are identically named, there's a naming collision where the arguments shadow their identically-named member variables counterparts inside the constructor body. In the original code, unqualified names inside the asserts, referred to the constructor arguments. This is wrong, because these have already been moved-from. It's not just a UB, but is broken. These SmallVector types when moved-from are reset i.e. the size resets to 0. This actually renders the affected asserts ineffective, since the comparisons operate on two hollowed-out objects and always succeed. This name ambiguity is fixed by using 'this->' to correctly refer to the initialized member variables carrying the relevant state. 3. While the fix 2 above made the asserts act as intended, it also unexpectedly broke one mlir test: `llvm-lit -v mlir/test/Dialect/SparseTensor/sparse_scalars.mlir` This required fixing the assert logic itself, which likely has never worked and went unnoticed all this time due to the bug 2. Specifically, in the failing test that uses `mlir/test/Dialect/SparseTensor/sparse_scalars.mlir` the '%argq' of 'ins' is defined as 'f32' scalar type, but the original code inside the assert had no support for scalar types as written, and was breaking the test. Testing: ``` ninja check-mlir llvm-lit -v mlir/test/Dialect/SparseTensor/sparse_scalars.mlir ```
These were disabled when adjusting tests to work with the internal shell because the implementation on these systems of env did not support the -u option. Now that we have switched to the internal shell and env -u is implemented internally, these tests should work again.
…NFC (#165121) - On targets that don't require the Triple, don't pass it. - Use `.value_or` to where possible.
…#165065) CreatePastEnd parameter had no effect on the label creation. Remove it.
…cl through parse (#164778) Instead of manually creating and adding a PTU, we should be able to use `RegisterPTU` which does the same job here.
Recently switched jobs. In practice this doesn't change much since I'm still in the security group to represent Rust, but I'm updating the actual company I work for to keep the list up to date.
…pVF (#156723) Transform TC and VF to same numerical space when they are different.
A conventional "if" statement is easier to read than the do-while(false) pattern used here.
The "if" statement being removed in this patch is identical to the "else" clause.
1. createHvxPrefixPred was computing an invalid byte count for small predicate types, leading to a crash during instruction selection. 2. HexagonTargetLowering::SplitHvxMemOp assumed the memory vector type is always simple. This patch adds a guard to avoid processing non-simple vector types, which can lead to failure. Patch By: Fateme Hosseini Co-authored-by: pavani karveti <[email protected]> Co-authored-by: Sergei Larin <[email protected]> Co-authored-by: Pavani Karveti <[email protected]>
Part of #102817. This patch optimizes `rng::generate_n` for segmented iterators by forwarding the implementation directly to `std::generate_n`. - before ``` rng::generate_n(deque<int>)/32 21.7 ns 22.0 ns 32000000 rng::generate_n(deque<int>)/50 30.8 ns 30.7 ns 22400000 rng::generate_n(deque<int>)/1024 492 ns 488 ns 1120000 rng::generate_n(deque<int>)/8192 3938 ns 3924 ns 179200 ``` - after ``` rng::generate_n(deque<int>)/32 11.0 ns 11.0 ns 64000000 rng::generate_n(deque<int>)/50 16.2 ns 16.1 ns 40727273 rng::generate_n(deque<int>)/1024 292 ns 286 ns 2240000 rng::generate_n(deque<int>)/8192 2291 ns 2302 ns 298667 ```
…r switch lowering (#155910) Currently it is considered suitable to lower to a bit test for a set of switch case clusters when the the number of unique destinations (`NumDests`) and the number of total comparisons (`NumCmps`) satisfy: `(NumDests == 1 && NumCmps >= 3) || (NumDests == 2 && NumCmps >= 5) || (NumDests == 3 && NumCmps >= 6)` However it is found for some cases on powerpc, for example, when NumDests is 3, and the number of comparisons for each destination is all 2, it's not profitable to lower the switch to bit test. This is to add an option to set the minimum of largest number of comparisons to use bit test for switch lowering. --------- Co-authored-by: Shimin Cui <[email protected]>
…#165371) We may need to load ZT0 after the call, so we can't perform a tail call.
This test fails on some arm64 macOS runs currently. This patch bumps up the number of runs by 10x to hopefully get it passing consistently. rdar://162122184
This test is now XPASSing due to a linker update on the platform. This patch removes the XFAIL from the test. rdar://163149345
…5417) Skip the test for Windows hosts. This patch fixes the buildbot `lldb-remote-linux-win`. https://lab.llvm.org/buildbot/#/builders/197/builds/10304
…token() (#156842) Implement code generation for `__builtin_infer_alloc_token()`. The `AllocToken` pass is now registered to run unconditionally in the optimization pipeline. This ensures that all instances of the `llvm.alloc.token.id` intrinsic are lowered to constant token IDs, regardless of whether `-fsanitize=alloc-token` is enabled. This guarantees that the builtin always resolves to a token value, providing a consistent and reliable mechanism for compile-time token querying. This completes `__builtin_infer_alloc_token(<malloc-args>, ...)` to allow compile-time querying of the token ID, where the builtin arguments mirror those normally passed to any allocation function. The argument expressions are unevaluated operands. For type-based token modes, the same type inference logic is used as for untyped allocation calls. For example the ID that is passed to (with `-fsanitize=alloc-token`): some_malloc(sizeof(Type), ...) is equivalent to the token ID returned by __builtin_infer_alloc_token(sizeof(Type), ...) The builtin provides a mechanism to pass or compare token IDs in code that needs to be explicitly allocation token-aware (such as inside an allocator, or through wrapper macros). A more concrete demonstration of __builtin_infer_alloc_token's use is enabling type-aware Slab allocations in the Linux kernel: https://lore.kernel.org/all/[email protected]/ Notably, any kind of allocation-call rewriting is a poor fit for the Linux kernel's kmalloc-family functions, which are macros that wrap (multiple) layers of inline and non-inline wrapper functions. Given the Linux kernel defines its own allocation APIs, the more explicit builtin gives the right level of control over where the type inference happens and the resulting token is passed.
…valuation (#164026) Enables constexpr evaluation for the following AVX512 Integer Comparison Intrinsics: ``` _mm_cmp_epi8_mask _mm_cmp_epu8_mask _mm_cmp_epi16_mask _mm_cmp_epu16_mask _mm_cmp_epi32_mask _mm_cmp_epu32_mask _mm_cmp_epi64_mask _mm_cmp_epu64_mask _mm256_cmp_epi8_mask _mm256_cmp_epu8_mask _mm256_cmp_epi16_mask _mm256_cmp_epu16_mask _mm256_cmp_epi32_mask _mm256_cmp_epu32_mask _mm256_cmp_epi64_mask _mm256_cmp_epu64_mask _mm512_cmp_epi8_mask _mm512_cmp_epu8_mask _mm512_cmp_epi16_mask _mm512_cmp_epu16_mask _mm512_cmp_epi32_mask _mm512_cmp_epu32_mask _mm512_cmp_epi64_mask _mm512_cmp_epu64_mask ``` Part 1 of #162054
Upstream try block with only noexcept calls inside, which doesn't need to be converted to TryCallOp Issue #154992
Update `amdgpu.wmma` op definition and implement amdgpu to rocdl conversion for new variants.
This is still somehow a WIP, we have some issues with this interface that are not trivial to solve. This patch tries to make the concepts of RegionBranchPoint and RegionSuccessor more robust and aligned with their definition: - A `RegionBranchPoint` is either the parent (`RegionBranchOpInterface`) op or a `RegionBranchTerminatorOpInterface` operation in a nested region. - A `RegionSuccessor` is either one of the nested region or the parent `RegionBranchOpInterface` Some new methods with reasonnable default implementation are added to help resolving the flow of values across the RegionBranchOpInterface. It is still not trivial in the current state to walk the def-use chain backward with this interface. For example when you have the 3rd block argument in the entry block of a for-loop, finding the matching operands requires to know about the hidden loop iterator block argument and where the iterargs start. The API is designed around forward-tracking of the chain unfortunately. Try to reland #161575 ; I suspect a buildbot incremental build issue.
Add new instruction `mtlpl`.
A reduction (including partial reductions) with a multiply of a constant value can be bundled by first converting it from `reduce.add(mul(ext, const))` to `reduce.add(mul(ext, ext(const)))` as long as it is safe to extend the constant. This PR adds such bundling by first truncating the constant to the source type of the other extend, then extending it to the destination type of the extend. The first truncate is necessary so that the types of each extend's operand are then the same, and the call to canConstantBeExtended proves that the extend following a truncate is safe to do. The truncate is removed by optimisations. This is a stacked PR, 1a and 1b can be merged in any order: 1a. #147302 1b. #163175 2. -> #162503
With Xqcili, `c.li` may be relaxed to `qc.e.li` (this is because `qc.e.li` is compressed into `c.li`, which needs to be undone). `qc.e.li` is relaxable, so we need to mark `c.li` as linker relaxable when it is emitted. This fixup cannot be emitted as a relocation, but we still mark it as requiring no R_RISCV_RELAX in case this changes in the future.
Consider OpenMP stylized expression to be a template to be instantiated with a series of types listed on the containing directive (currently DECLARE_REDUCTION). Create a series of instantiations in the parser, allowing OpenMP special variables to be declared separately for each type. --------- Co-authored-by: Tom Eccles <[email protected]>
Adds the WaveActiveMin intrinsic from #99169. I think I did all of the required things on the checklist: - [x] Implement `WaveActiveMin` clang builtin, - [x] Link `WaveActiveMin` clang builtin with `hlsl_intrinsics.h` - [x] Add sema checks for `WaveActiveMin` to `CheckHLSLBuiltinFunctionCall` in `SemaChecking.cpp` - [x] Add codegen for `WaveActiveMin` to `EmitHLSLBuiltinExpr` in `CGBuiltin.cpp` - [x] Add codegen tests to `clang/test/CodeGenHLSL/builtins/WaveActiveMin.hlsl` - [x] Add sema tests to `clang/test/SemaHLSL/BuiltIns/WaveActiveMin-errors.hlsl` - [x] Create the `int_dx_WaveActiveMin` intrinsic in `IntrinsicsDirectX.td` - [x] Create the `DXILOpMapping` of `int_dx_WaveActiveMin` to `119` in `DXIL.td` - [x] Create the `WaveActiveMin.ll` and `WaveActiveMin_errors.ll` tests in `llvm/test/CodeGen/DirectX/` - [x] Create the `int_spv_WaveActiveMin` intrinsic in `IntrinsicsSPIRV.td` - [x] In SPIRVInstructionSelector.cpp create the `WaveActiveMin` lowering and map it to `int_spv_WaveActiveMin` in `SPIRVInstructionSelector::selectIntrinsic`. - [x] Create SPIR-V backend test case in `llvm/test/CodeGen/SPIRV/hlsl-intrinsics/WaveActiveMin.ll But as some of the code has changed and was moved around (E.G. `CGBuiltin.cpp` -> `CGHLSLBuiltins.cpp`) I mostly followed how `WaveActiveMax()` is implemented. I have not been able to run the tests myself as I am unsure which project runs the correct test. Any guidance on how I can test myself would be helpful. Also added some tests to the offload-test-suite llvm/offload-test-suite#478
Need to re-check the instruction with the non-schedulable parent, only if this parent has a user phi node (i.e. it is used only outside the block) and the user instruction has unique parent instruction. Fixes issue reported in 20675ee#commitcomment-168863594
) Fix building ClangIR after RegionBranchOpInterface revamp (#165429)
In 9865171, a file named aarch64-mlr-for-calls-only.c was added to clang/include/clang/Driver. This file contains only llvm-lit directives. The file has been moved to clang/test/Driver where it ought to reside.
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.4)
Can you help keep this open source service alive? 💖 Please sponsor : )