forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 4
[AutoBump] Merge with 8612fa0d (9) #266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This has been a supported option for ELF and is added to the COFF Linker in llvm#85701
Make form-expressions not create `emitc.expression`s for operations inside the `emitc.expression`s, since they are invalid.
…ated calls to getDataLayout(). NFC.
`fneg + copysign` is better than fmul for analysis/codegen. godbolt: https://godbolt.org/z/eEs6dGd1G Alive2: https://alive2.llvm.org/ce/z/K3M5BA
Kernel descriptor attributes, with their respective emit and asm parse functionality, converted to MCExpr.
…lvm#85990) macosx and darwin in triples are equivalent. rdar://124246653
This adds a libc++ to modules.json as is currently used by libc++. When libc++.so is not found the function will search for libc++.a as fallback.
…83876) This adds a new API built with the `ValueBoundsConstraintSet` to compute the bounds of possibly scalable quantities. It uses knowledge of the range of vscale (which is defined by the target architecture), to solve for the bound as either a constant or an expression in terms of vscale. The result is an `AffineMap` that will always take at most one parameter, vscale, and returns a single result, which is the bound of `value`. The API is defined as follows: ```c++ FailureOr<ConstantOrScalableBound> vector::ScalableValueBoundsConstraintSet::computeScalableBound( Value value, std::optional<int64_t> dim, unsigned vscaleMin, unsigned vscaleMax, presburger::BoundType boundType, bool closedUB = true, StopConditionFn stopCondition = nullptr); ``` Note: `ConstantOrScalableBound` is a thin wrapper over the `AffineMap` with a utility for converting the bound to a single quantity (i.e. a size and scalable flag). We believe this API could prove useful downstream in IREE (which uses a similar analysis to hoist allocas, which currently fails for scalable vectors).
NFC. gfx11_asm_vinterp.s already contained GFX12 run lines. Rename the assembler and disassembler tests to be sorted based on real16 or fake16 instead of gfxip. Note, both GFX11 and GFX12 currently only have fake16 (fake16 in encoding, but not by name) upstream, so that is why the test files have a -fake16 suffix. One test input is changed, and that is the disassembler test for unsupported bits in the instruction. It is now an input that is valid on both GFX11 and GFX12. This was necessary because the size of the opcode field changed.
NFC. Fix CHECK lines that seem to have a copy paste error. Move the test that was formerly in gfx12_dasm_vinterp.txt (see llvm#85949).
Check for all frame instructions in finalize isel, not just for the frame setup opcode. This was proven necessary, see llvm#78001 for discussion.
Document expectations for contributions to InstCombine, especially regarding test coverage and alive2 proofs.
if cost threshold is very low.
…plementation (llvm#86125) Allow targets to rely on TargetLowering::isGuaranteedNotToBeUndefOrPoisonForTargetNode to test nodes for canCreateUndefOrPoisonForTargetNode + all arguments are isGuaranteedNotToBeUndefOrPoison. Targets can still perform this themselves for specific special case nodes (e.g. target shuffles). Matches the fallback in SelectionDAG::isGuaranteedNotToBeUndefOrPoison
…x on RV32. (llvm#85871) I believe we can use XLen alignment as long as eliminateFrameIndex limits the maximum folded offset to 2043. This way when we split the load/store into two 2 instructions we'll be able to add 4 without overflowing simm12.
Adds stdio.h's rename() function as defined in n3096. Fixes llvm#84980.
…STALL_MODULES_DIR (llvm#86020) This reapplies 272d1b4 (from llvm#85756), which was reverted in 4079370. In the previous attempt, empty CMAKE_INSTALL_PREFIX was handled by quoting them, in d209d13. That made the calls to cmake_path(ABSOLUTE_PATH) succeed, but the output paths of that weren't actually absolute, which was required by file(RELATIVE_PATH). Avoid this issue by constructing a non-empty base directory variable to use for calculating the relative path.
…m#83773) Dwarf 5 allows separating filenames from .debug_line into a separate .debug_line_str section. The strings are referenced relative to the start of the .debug_line_str section. Previously, on COFF, the relocation information instead caused offsets to be relocated to the base address of the COFF-File. This lead to wrong offsets in linked COFF (PE) files which caused the debugger to be unable to find the correct source files. This patch fixes this problem by making the offsets relative to the start of the .debug_line_str section instead. There should be no changes for ELF-Files as everything seems to be working there. A test is also added to ensure that the correct relocation entries are emitted.
This is another step towards supporting DWARF5 checksums and inline source code in LLDB. This is a reland of llvm#85468 but without the functional change of storing the support file from the line table (yet).
The range for these operations is being constructed without the maximum value for the range due to an incorrect usage of the ConstantRange constructor. This causes Float2Int to think the range for 'uitofp i1' only contains 0 instead of 0 and 1.
Investigate the lowering of MemRef Load/Store ops and implement additional folding of created ops Aims to improve readability of generated lowered SPIR-V code. Part of work llvm#70704
…lvm#86002) The Doxygen comments for the `details` field of a progress report currently does not specify that this field will act as the initial set of details for a progress report that gets updated with `Progress::Increment()`. This commit clarifies this.
…egardless of Zfa. (llvm#85982) Previously we used BuildPairF64 and SplitF64 only if Zfa was supported since they will select register file moves that are only available with Zfa. We recently changed the handling of BuildPairF64/SplitF64 for Zdinx to not go through memory so we should use that for bitcast. That leaves the D without Zfa case that does need to go through memory. Previously we let type legalization expand to loads and stores using a new stack temporary created for each bitcast. After this patch we will create the loads ands stores in the custom inserter and share the same stack slot for all. This also allows DAGCombiner to optimize when bitcast is mixed with BuildPairF64/SplitF64.
Some header guards conflicted with clang. Fix a few others to follow the convention in the rest of the headers in flang.
…m#86041) We were passing the min and max values of the range to the ConstantRange constructor, but the constructor expects the upper bound to 1 more than the max value so we need to add 1. We also need to use getNonEmpty so that passing 0, 0 to the constructor creates a full range rather than an empty range. And passing smin, smax+1 doesn't cause an assertion. I believe this fixes at least some of the reason llvm#79158 was reverted.
…llvm#83027) According to POSIX 2018. 1. lineptr, n and stream can not be NULL. 2. If *n is non-zero, *lineptr must point to a region of at least *n bytes, or be a NULL pointer. Additionally, if *lineptr is not NULL, *n must not be undefined.
This change completes llvm#86155 - `DXIL.td` - lowering `fabs` intrinsic to the float dxil op. - `DXILIntrinsicExpansion.cpp` - Add intrinsic expansion for the abs case.
Completes llvm#86170 Completes llvm#86172 - `DXIL.td` - Add changes to lower the cosine and floor intrinsics to dxilOps.
There are a bunch of related directories in another dialects.
Made by following: llvm#83585 (comment) Thanks for the details Tomek! CPP-5080
SLP should be doing a better job, but both shuffles lower to poorer codegen than necessary
…ABD patterns. NFC.
…read-safe code (llvm#80813) **Description** The documentation of `transform.structured.tile_using_forall` says: _"It is the user’s responsibility to ensure that num_threads/tile_sizes is a valid tiling specification (i.e. that only tiles parallel dimensions, e.g. in the Linalg case)."_ In other words, tiling a non-parallel dimension would generate code with data races which is not safe to parallelize. For example, consider this example (included in the tests in this PR): ``` func.func @tile_thread_safety2(%arg0: tensor<100x300x8xf32>, %arg1: tensor<300x8xf32>) -> tensor<300x8xf32> { %0 = scf.forall (%arg2) in (8) shared_outs(%arg3 = %arg1) -> (tensor<300x8xf32>) { %1 = affine.min #map(%arg2) %2 = affine.max #map1(%1) %3 = affine.apply #map2(%arg2) %extracted_slice = tensor.extract_slice %arg0[%3, 0, 0] [%2, 300, 8] [1, 1, 1] : tensor<100x300x8xf32> to tensor<?x300x8xf32> %4 = linalg.generic {indexing_maps = [#map3, #map4], iterator_types = ["reduction", "parallel", "parallel"]} ins(%extracted_slice : tensor<?x300x8xf32>) outs(%arg3 : tensor<300x8xf32>) { ^bb0(%in: f32, %out: f32): %5 = arith.addf %in, %out : f32 linalg.yield %5 : f32 } -> tensor<300x8xf32> scf.forall.in_parallel { tensor.parallel_insert_slice %4 into %arg3[0, 0] [300, 8] [1, 1] : tensor<300x8xf32> into tensor<300x8xf32> } } return %0 : tensor<300x8xf32> } ``` We can easily see that this is not safe to parallelize because all threads would be writing to the same position in `%arg3` (in the `scf.forall.in_parallel`. This PR detects wether it's safe to `tile_using_forall` and emits a warning in the case it is not. **Brief explanation** It first generates a vector of affine expressions representing the tile values and stores it in `dimExprs`. These affine expressions are compared with the affine expressions coming from the results of the affine map of each output in the linalg op. So going back to the previous example, the original transform is: ``` #map = affine_map<(d0, d1, d2) -> (d0, d1, d2)> #map1 = affine_map<(d0, d1, d2) -> (d1, d2)> func.func @tile_thread_safety2(%arg0: tensor<100x300x8xf32>, %arg1: tensor<300x8xf32>) -> tensor<300x8xf32> { // expected-warning@+1 {{tiling is not thread safe at axis #0}} %0 = linalg.generic {indexing_maps = [#map, #map1], iterator_types = ["reduction", "parallel", "parallel"]} ins(%arg0 : tensor<100x300x8xf32>) outs(%arg1 : tensor<300x8xf32>) { ^bb0(%in: f32, %out: f32): %1 = arith.addf %in, %out : f32 linalg.yield %1 : f32 } -> tensor<300x8xf32> return %0 : tensor<300x8xf32> } module attributes {transform.with_named_sequence} { transform.named_sequence @__transform_main(%arg0: !transform.any_op {transform.readonly}) { %0 = transform.structured.match ops{["linalg.generic"]} in %arg0 : (!transform.any_op) -> !transform.any_op %forall, %tiled_generic = transform.structured.tile_using_forall %0 num_threads [8] : (!transform.any_op) -> (!transform.any_op, !transform.any_op) transform.yield } } ``` The `num_threads` attribute would be represented as `(d0)`. Because the linalg op has only one output (`arg1`) it would only check against the results of `#map1`, which are `(d1, d2)`. The idea is to check that all affine expressions in `dimExprs` are present in the output affine map. In this example, `d0` is not in `(d1, d2)`, so tiling that axis is considered not thread safe.
…ry (llvm#86217) Summary: The original intention of the `openmp-add-rpath` option was to add the rpath to the language runtime directory. However, the current implementation only adds it to the compiler's resource directory. This patch adds support for appending the `-rpath` to the compiler's standard library directory as well. Currently this is `<exe>/../lib/<triple>`.
…st (llvm#86198) Completes llvm#84839 --------- Co-authored-by: Farzon Lotfi <[email protected]>
…nfoFormat (drop additional 'r' before Format)
This was changed by llvm#84765 but turned out to be buggy. Since it isn't used and isn't tested, it is probably best to remove it.
…#85620) Recently llvm#84765 made the split markers of various tools configurable but did not test *not* using the split markers for two of them. This PR adds those tests.
…lent of) a C-style cast) (llvm#85263) Implement [LWG3528](https://wg21.link/LWG3528). Based on LWG3528(https://wg21.link/LWG3528) and http://eel.is/c++draft/description#structure.requirements-9, the standard allows to impose requirements, we constraint `std::make_from_tuple` to make `std::make_from_tuple` SFINAE friendly and also avoid worse diagnostic messages. We still keep the constraints of `std::__make_from_tuple_impl` so that `std::__make_from_tuple_impl` will have the same advantages when used alone. --------- Signed-off-by: yronglin <[email protected]>
…d llvm-lto2 (llvm#86271) Directly load all bitcode into the new debug info format in `llvm-lto` and `llvm-lto2`. This means that new-mode bitcode no longer round-trips back to old-mode after parsing, and that old-mode bitcode gets auto-upgraded to new-mode debug info (which is the current in-memory default in LLVM).
…ion. (llvm#86150) The option to specify the style of alignment of the colons inside TableGen's DAGArg.
Our existing diagnostics for catching unsequenced modifications handles test coverage for N1282, which is correcting the standard based on the resolution of DR087.
…ectives (llvm#84349) This PR refactors bounds offsetting by combining the two differing implementations (one applying to initial derived type member map implementation for descriptors and the other for regular arrays, effectively allocatable array vs regular array in fortran) now that it's a little simpler to do. The PR also moves the utilization of createAlteredByCaptureMap into genMapInfoOp, where it will be correctly applied to all MapInfoData, appropriately offsetting and altering Pointer data set in the kernel argument structure on the host. This primarily means bounds offsets will now correctly apply to enter/exit/update map clauses as opposed to just the Target directive that is currently the case. A few fortran runtime tests have been added to verify this new behavior. This PR depends on: llvm#84328 and is an extraction of the larger derived type member map PR stack (so a requirement for it to land).
cmcgirr-amd
approved these changes
Aug 16, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.