[AutoBump] Merge with 8612fa0d (9) #266

cferry-AMD · 2024-08-16T10:25:27Z

No description provided.

This has been a supported option for ELF and is added to the COFF Linker in llvm#85701

Make form-expressions not create `emitc.expression`s for operations inside the `emitc.expression`s, since they are invalid.

…ated calls to getDataLayout(). NFC.

`fneg + copysign` is better than fmul for analysis/codegen. godbolt: https://godbolt.org/z/eEs6dGd1G Alive2: https://alive2.llvm.org/ce/z/K3M5BA

Kernel descriptor attributes, with their respective emit and asm parse functionality, converted to MCExpr.

…lvm#85990) macosx and darwin in triples are equivalent. rdar://124246653

This adds a libc++ to modules.json as is currently used by libc++. When libc++.so is not found the function will search for libc++.a as fallback.

…83876) This adds a new API built with the `ValueBoundsConstraintSet` to compute the bounds of possibly scalable quantities. It uses knowledge of the range of vscale (which is defined by the target architecture), to solve for the bound as either a constant or an expression in terms of vscale. The result is an `AffineMap` that will always take at most one parameter, vscale, and returns a single result, which is the bound of `value`. The API is defined as follows: ```c++ FailureOr<ConstantOrScalableBound> vector::ScalableValueBoundsConstraintSet::computeScalableBound( Value value, std::optional<int64_t> dim, unsigned vscaleMin, unsigned vscaleMax, presburger::BoundType boundType, bool closedUB = true, StopConditionFn stopCondition = nullptr); ``` Note: `ConstantOrScalableBound` is a thin wrapper over the `AffineMap` with a utility for converting the bound to a single quantity (i.e. a size and scalable flag). We believe this API could prove useful downstream in IREE (which uses a similar analysis to hoist allocas, which currently fails for scalable vectors).

NFC. gfx11_asm_vinterp.s already contained GFX12 run lines. Rename the assembler and disassembler tests to be sorted based on real16 or fake16 instead of gfxip. Note, both GFX11 and GFX12 currently only have fake16 (fake16 in encoding, but not by name) upstream, so that is why the test files have a -fake16 suffix. One test input is changed, and that is the disassembler test for unsupported bits in the instruction. It is now an input that is valid on both GFX11 and GFX12. This was necessary because the size of the opcode field changed.

NFC. Fix CHECK lines that seem to have a copy paste error. Move the test that was formerly in gfx12_dasm_vinterp.txt (see llvm#85949).

Check for all frame instructions in finalize isel, not just for the frame setup opcode. This was proven necessary, see llvm#78001 for discussion.

Document expectations for contributions to InstCombine, especially regarding test coverage and alive2 proofs.

if cost threshold is very low.

…plementation (llvm#86125) Allow targets to rely on TargetLowering::isGuaranteedNotToBeUndefOrPoisonForTargetNode to test nodes for canCreateUndefOrPoisonForTargetNode + all arguments are isGuaranteedNotToBeUndefOrPoison. Targets can still perform this themselves for specific special case nodes (e.g. target shuffles). Matches the fallback in SelectionDAG::isGuaranteedNotToBeUndefOrPoison

…x on RV32. (llvm#85871) I believe we can use XLen alignment as long as eliminateFrameIndex limits the maximum folded offset to 2043. This way when we split the load/store into two 2 instructions we'll be able to add 4 without overflowing simm12.

Adds stdio.h's rename() function as defined in n3096. Fixes llvm#84980.

…STALL_MODULES_DIR (llvm#86020) This reapplies 272d1b4 (from llvm#85756), which was reverted in 4079370. In the previous attempt, empty CMAKE_INSTALL_PREFIX was handled by quoting them, in d209d13. That made the calls to cmake_path(ABSOLUTE_PATH) succeed, but the output paths of that weren't actually absolute, which was required by file(RELATIVE_PATH). Avoid this issue by constructing a non-empty base directory variable to use for calculating the relative path.

…m#83773) Dwarf 5 allows separating filenames from .debug_line into a separate .debug_line_str section. The strings are referenced relative to the start of the .debug_line_str section. Previously, on COFF, the relocation information instead caused offsets to be relocated to the base address of the COFF-File. This lead to wrong offsets in linked COFF (PE) files which caused the debugger to be unable to find the correct source files. This patch fixes this problem by making the offsets relative to the start of the .debug_line_str section instead. There should be no changes for ELF-Files as everything seems to be working there. A test is also added to ensure that the correct relocation entries are emitted.

Link: llvm#84980 Link: llvm#85068

This is another step towards supporting DWARF5 checksums and inline source code in LLDB. This is a reland of llvm#85468 but without the functional change of storing the support file from the line table (yet).

The range for these operations is being constructed without the maximum value for the range due to an incorrect usage of the ConstantRange constructor. This causes Float2Int to think the range for 'uitofp i1' only contains 0 instead of 0 and 1.

Investigate the lowering of MemRef Load/Store ops and implement additional folding of created ops Aims to improve readability of generated lowered SPIR-V code. Part of work llvm#70704

…lvm#86002) The Doxygen comments for the `details` field of a progress report currently does not specify that this field will act as the initial set of details for a progress report that gets updated with `Progress::Increment()`. This commit clarifies this.

…egardless of Zfa. (llvm#85982) Previously we used BuildPairF64 and SplitF64 only if Zfa was supported since they will select register file moves that are only available with Zfa. We recently changed the handling of BuildPairF64/SplitF64 for Zdinx to not go through memory so we should use that for bitcast. That leaves the D without Zfa case that does need to go through memory. Previously we let type legalization expand to loads and stores using a new stack temporary created for each bitcast. After this patch we will create the loads ands stores in the custom inserter and share the same stack slot for all. This also allows DAGCombiner to optimize when bitcast is mixed with BuildPairF64/SplitF64.

…lvm#85915)

Some header guards conflicted with clang. Fix a few others to follow the convention in the rest of the headers in flang.

…m#86041) We were passing the min and max values of the range to the ConstantRange constructor, but the constructor expects the upper bound to 1 more than the max value so we need to add 1. We also need to use getNonEmpty so that passing 0, 0 to the constructor creates a full range rather than an empty range. And passing smin, smax+1 doesn't cause an assertion. I believe this fixes at least some of the reason llvm#79158 was reverted.

…ck::PreCall> (llvm#83027)

…llvm#83027) According to POSIX 2018. 1. lineptr, n and stream can not be NULL. 2. If *n is non-zero, *lineptr must point to a region of at least *n bytes, or be a NULL pointer. Additionally, if *lineptr is not NULL, *n must not be undefined.

This change completes llvm#86155 - `DXIL.td` - lowering `fabs` intrinsic to the float dxil op. - `DXILIntrinsicExpansion.cpp` - Add intrinsic expansion for the abs case.

Completes llvm#86170 Completes llvm#86172 - `DXIL.td` - Add changes to lower the cosine and floor intrinsics to dxilOps.

There are a bunch of related directories in another dialects.

Made by following: llvm#83585 (comment) Thanks for the details Tomek! CPP-5080

SLP should be doing a better job, but both shuffles lower to poorer codegen than necessary

…ABD patterns. NFC.

…read-safe code (llvm#80813) **Description** The documentation of `transform.structured.tile_using_forall` says: _"It is the user’s responsibility to ensure that num_threads/tile_sizes is a valid tiling specification (i.e. that only tiles parallel dimensions, e.g. in the Linalg case)."_ In other words, tiling a non-parallel dimension would generate code with data races which is not safe to parallelize. For example, consider this example (included in the tests in this PR): ``` func.func @tile_thread_safety2(%arg0: tensor<100x300x8xf32>, %arg1: tensor<300x8xf32>) -> tensor<300x8xf32> { %0 = scf.forall (%arg2) in (8) shared_outs(%arg3 = %arg1) -> (tensor<300x8xf32>) { %1 = affine.min #map(%arg2) %2 = affine.max #map1(%1) %3 = affine.apply #map2(%arg2) %extracted_slice = tensor.extract_slice %arg0[%3, 0, 0] [%2, 300, 8] [1, 1, 1] : tensor<100x300x8xf32> to tensor<?x300x8xf32> %4 = linalg.generic {indexing_maps = [#map3, #map4], iterator_types = ["reduction", "parallel", "parallel"]} ins(%extracted_slice : tensor<?x300x8xf32>) outs(%arg3 : tensor<300x8xf32>) { ^bb0(%in: f32, %out: f32): %5 = arith.addf %in, %out : f32 linalg.yield %5 : f32 } -> tensor<300x8xf32> scf.forall.in_parallel { tensor.parallel_insert_slice %4 into %arg3[0, 0] [300, 8] [1, 1] : tensor<300x8xf32> into tensor<300x8xf32> } } return %0 : tensor<300x8xf32> } ``` We can easily see that this is not safe to parallelize because all threads would be writing to the same position in `%arg3` (in the `scf.forall.in_parallel`. This PR detects wether it's safe to `tile_using_forall` and emits a warning in the case it is not. **Brief explanation** It first generates a vector of affine expressions representing the tile values and stores it in `dimExprs`. These affine expressions are compared with the affine expressions coming from the results of the affine map of each output in the linalg op. So going back to the previous example, the original transform is: ``` #map = affine_map<(d0, d1, d2) -> (d0, d1, d2)> #map1 = affine_map<(d0, d1, d2) -> (d1, d2)> func.func @tile_thread_safety2(%arg0: tensor<100x300x8xf32>, %arg1: tensor<300x8xf32>) -> tensor<300x8xf32> { // expected-warning@+1 {{tiling is not thread safe at axis #0}} %0 = linalg.generic {indexing_maps = [#map, #map1], iterator_types = ["reduction", "parallel", "parallel"]} ins(%arg0 : tensor<100x300x8xf32>) outs(%arg1 : tensor<300x8xf32>) { ^bb0(%in: f32, %out: f32): %1 = arith.addf %in, %out : f32 linalg.yield %1 : f32 } -> tensor<300x8xf32> return %0 : tensor<300x8xf32> } module attributes {transform.with_named_sequence} { transform.named_sequence @__transform_main(%arg0: !transform.any_op {transform.readonly}) { %0 = transform.structured.match ops{["linalg.generic"]} in %arg0 : (!transform.any_op) -> !transform.any_op %forall, %tiled_generic = transform.structured.tile_using_forall %0 num_threads [8] : (!transform.any_op) -> (!transform.any_op, !transform.any_op) transform.yield } } ``` The `num_threads` attribute would be represented as `(d0)`. Because the linalg op has only one output (`arg1`) it would only check against the results of `#map1`, which are `(d1, d2)`. The idea is to check that all affine expressions in `dimExprs` are present in the output affine map. In this example, `d0` is not in `(d1, d2)`, so tiling that axis is considered not thread safe.

…ry (llvm#86217) Summary: The original intention of the `openmp-add-rpath` option was to add the rpath to the language runtime directory. However, the current implementation only adds it to the compiler's resource directory. This patch adds support for appending the `-rpath` to the compiler's standard library directory as well. Currently this is `<exe>/../lib/<triple>`.

…st (llvm#86198) Completes llvm#84839 --------- Co-authored-by: Farzon Lotfi <[email protected]>

…nfoFormat (drop additional 'r' before Format)

This was changed by llvm#84765 but turned out to be buggy. Since it isn't used and isn't tested, it is probably best to remove it.

…#85620) Recently llvm#84765 made the split markers of various tools configurable but did not test *not* using the split markers for two of them. This PR adds those tests.

…lent of) a C-style cast) (llvm#85263) Implement [LWG3528](https://wg21.link/LWG3528). Based on LWG3528(https://wg21.link/LWG3528) and http://eel.is/c++draft/description#structure.requirements-9, the standard allows to impose requirements, we constraint `std::make_from_tuple` to make `std::make_from_tuple` SFINAE friendly and also avoid worse diagnostic messages. We still keep the constraints of `std::__make_from_tuple_impl` so that `std::__make_from_tuple_impl` will have the same advantages when used alone. --------- Signed-off-by: yronglin <[email protected]>

…d llvm-lto2 (llvm#86271) Directly load all bitcode into the new debug info format in `llvm-lto` and `llvm-lto2`. This means that new-mode bitcode no longer round-trips back to old-mode after parsing, and that old-mode bitcode gets auto-upgraded to new-mode debug info (which is the current in-memory default in LLVM).

…t coverage

…ions. NFC

…ion. (llvm#86150) The option to specify the style of alignment of the colons inside TableGen's DAGArg.

) Since ARM64EC/X objects use regular ARM64 relocations, any special handling must be done for them too.

Our existing diagnostics for catching unsequenced modifications handles test coverage for N1282, which is correcting the standard based on the resolution of DR087.

…ectives (llvm#84349) This PR refactors bounds offsetting by combining the two differing implementations (one applying to initial derived type member map implementation for descriptors and the other for regular arrays, effectively allocatable array vs regular array in fortran) now that it's a little simpler to do. The PR also moves the utilization of createAlteredByCaptureMap into genMapInfoOp, where it will be correctly applied to all MapInfoData, appropriately offsetting and altering Pointer data set in the kernel argument structure on the host. This primarily means bounds offsets will now correctly apply to enter/exit/update map clauses as opposed to just the Target directive that is currently the case. A few fortran runtime tests have been added to verify this new behavior. This PR depends on: llvm#84328 and is an extraction of the larger derived type member map PR stack (so a requirement for it to land).

chrulski-intel and others added 30 commits March 21, 2024 21:34

[LLD] [MinGW] Implement the -lto-sample-profile option (llvm#85841)

276283d

This has been a supported option for ELF and is added to the COFF Linker in llvm#85701

[mlir][emitc] Fix form-expressions inside expression (llvm#86081)

5344a37

Make form-expressions not create `emitc.expression`s for operations inside the `emitc.expression`s, since they are invalid.

[VectorCombine] Add DataLayout to VectorCombine class instead of repe…

15eba9c

…ated calls to getDataLayout(). NFC.

[ARM] Regenerate some check lines. NFC

686f459

[InstCombine] Fold fmul X, -0.0 into copysign(0.0, -X) (llvm#85772)

2bfa7d0

`fneg + copysign` is better than fmul for analysis/codegen. godbolt: https://godbolt.org/z/eEs6dGd1G Alive2: https://alive2.llvm.org/ce/z/K3M5BA

[AMDGPU] MCExpr-ify MC layer kernel descriptor (llvm#80855)

857161c

Kernel descriptor attributes, with their respective emit and asm parse functionality, converted to MCExpr.

[gn build] Port 857161c

26c3d01

Disable driver tests on macosx that are currently disabled on darwin (l…

a11d9b4

…lvm#85990) macosx and darwin in triples are equivalent. rdar://124246653

[clang] Improves -print-library-module-manifest-path. (llvm#85943)

2152094

This adds a libc++ to modules.json as is currently used by libc++. When libc++.so is not found the function will search for libc++.a as fallback.

[AMDGPU][MC] Fix GFX12 check line typo and move test

44278f2

NFC. Fix CHECK lines that seem to have a copy paste error. Move the test that was formerly in gfx12_dasm_vinterp.txt (see llvm#85949).

Check for all frame instructions in finalize isel. (llvm#85945)

b4b5e82

Check for all frame instructions in finalize isel, not just for the frame setup opcode. This was proven necessary, see llvm#78001 for discussion.

[InstCombine] Add contributor guide (llvm#79007)

6898147

Document expectations for contributions to InstCombine, especially regarding test coverage and alive2 proofs.

[SLP]Fix a crash for gather node with instructions from different bbs,

8d7a6e2

if cost threshold is very low.

[libc][c11] Add stdio.h's rename() function (llvm#85068)

c04807c

Adds stdio.h's rename() function as defined in n3096. Fixes llvm#84980.

[libc][stdio][test] fixup rename test (llvm#86136)

0c8dfc8

Link: llvm#84980 Link: llvm#85068

[lldb] Reland: Store SupportFile in FileEntry (NFC) (llvm#85892)

556fe5f

This is another step towards supporting DWARF5 checksums and inline source code in LLDB. This is a reland of llvm#85468 but without the functional change of storing the support file from the line table (yet).

[mlir][spirv] Improve folding of MemRef to SPIRV Lowering (llvm#85433)

38f8a3c

Investigate the lowering of MemRef Load/Store ops and implement additional folding of created ops Aims to improve readability of generated lowered SPIR-V code. Part of work llvm#70704

[lldb] Add missing initialization in LineEntry ctor

81bd799

AMDGPU: Use defset to cleanup marking MFMA intrinsics as divergent (l…

d8b0d8d

…lvm#85915)

[flang][NFC] Fix header guards

d9f0d9a

Some header guards conflicted with clang. Fix a few others to follow the convention in the rest of the headers in flang.

alejandro-alvarez-sonarsource and others added 25 commits March 22, 2024 11:50

[clang][analyzer][NFC] UnixAPIMisuseChecker inherits from Checker<che…

a62441d

…ck::PreCall> (llvm#83027)

[DXIL] Complete abs lowering (llvm#86158)

d8e5c0b

This change completes llvm#86155 - `DXIL.td` - lowering `fabs` intrinsic to the float dxil op. - `DXILIntrinsicExpansion.cpp` - Add intrinsic expansion for the abs case.

[DXIL] Add lowerings for cosine and floor (llvm#86173)

79c32eb

Completes llvm#86170 Completes llvm#86172 - `DXIL.td` - Add changes to lower the cosine and floor intrinsics to dxilOps.

CODEOWNERS: extend scope of MLIR transform dialect

db33444

There are a bunch of related directories in another dialects.

[analyzer] Support C++23 static operator calls (llvm#84972)

e925968

Made by following: llvm#83585 (comment) Thanks for the details Tomek! CPP-5080

[X86] Add shuffle tests from Issue llvm#86076

74c3150

SLP should be doing a better job, but both shuffles lower to poorer codegen than necessary

[DAG] Fix some missing formatting when I rewrote the SUB(MAX,MIN) -> …

ceabaa7

…ABD patterns. NFC.

[DirectX][Docs] Add DXILIntrinsicExpansion Pass to DXILArchitecture.r…

d51f1c4

…st (llvm#86198) Completes llvm#84839 --------- Co-authored-by: Farzon Lotfi <[email protected]>

NFC Rename LoadBitcodeIntoNewDbgInforFormat to LoadBitcodeIntoNewDbgI…

fe64b26

…nfoFormat (drop additional 'r' before Format)

[mlir] Remove unused and untested shouldSplitInputFile. (llvm#85622)

e1f50fd

This was changed by llvm#84765 but turned out to be buggy. Since it isn't used and isn't tested, it is probably best to remove it.

[mlir] Extend split marker tests of mlir-opt and mlir-pdll. (llvm…

83da7b6

…#85620) Recently llvm#84765 made the split markers of various tools configurable but did not test *not* using the split markers for two of them. This PR adds those tests.

[X86][Headers] change 'yields' to 'returns' in more places

04a6e0f

[BOLT] Add BB index to BAT (llvm#86044)

3b3de48

[X86] vector-half-conversions.ll - add v4f16->v4i32 fptosi/fptoui tes…

a277dd8

…t coverage

[AArch64] Add a test to show incorrect latencies into Bundle instruct…

f82d018

…ions. NFC

[clang-format] Added AlignConsecutiveTableGenBreakingDAGArgColons opt…

e54af60

…ion. (llvm#86150) The option to specify the style of alignment of the colons inside TableGen's DAGArg.

[MC][COFF][AArch64] Treat ARM64EC/X as ARM64 for relocations (llvm#86019

46b853a

) Since ARM64EC/X objects use regular ARM64 relocations, any special handling must be done for them too.

[C11] Add test & update status of N1282 and DR087

d231e3b

Our existing diagnostics for catching unsequenced modifications handles test coverage for N1282, which is correcting the standard based on the resolution of DR087.

[AutoBump] Merge with 8612fa0

64c0a57

cferry-AMD requested a review from cmcgirr-amd August 16, 2024 10:26

cmcgirr-amd approved these changes Aug 16, 2024

View reviewed changes

Base automatically changed from bump_to_0aa6d57e to feature/fused-ops August 16, 2024 13:57

cferry-AMD merged commit 6f96182 into feature/fused-ops Aug 16, 2024

cferry-AMD deleted the bump_to_8612fa0d branch August 16, 2024 14:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AutoBump] Merge with 8612fa0d (9) #266

[AutoBump] Merge with 8612fa0d (9) #266

Uh oh!

cferry-AMD commented Aug 16, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

75 participants

[AutoBump] Merge with 8612fa0d (9) #266

[AutoBump] Merge with 8612fa0d (9) #266

Uh oh!

Conversation

cferry-AMD commented Aug 16, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

75 participants