Skip to content

Conversation

@cferry-AMD
Copy link
Collaborator

No description provided.

chrulski-intel and others added 30 commits March 21, 2024 21:34
This has been a supported option for ELF and is added to the COFF Linker
in llvm#85701
Make form-expressions not create `emitc.expression`s for operations
inside the `emitc.expression`s, since they are invalid.
Kernel descriptor attributes, with their respective emit and asm parse functionality, converted to MCExpr.
…lvm#85990)

macosx and darwin in triples are equivalent.

rdar://124246653
This adds a libc++ to modules.json as is currently used by libc++. When
libc++.so is not found the function will search for libc++.a as
fallback.
…83876)

This adds a new API built with the `ValueBoundsConstraintSet` to compute
the bounds of possibly scalable quantities. It uses knowledge of the
range of vscale (which is defined by the target architecture), to solve
for the bound as either a constant or an expression in terms of vscale.

The result is an `AffineMap` that will always take at most one
parameter, vscale, and returns a single result, which is the bound of
`value`.

The API is defined as follows:

```c++
FailureOr<ConstantOrScalableBound>
vector::ScalableValueBoundsConstraintSet::computeScalableBound(
  Value value, std::optional<int64_t> dim,
  unsigned vscaleMin, unsigned vscaleMax,
  presburger::BoundType boundType, 
  bool closedUB = true,
  StopConditionFn stopCondition = nullptr);
```

Note: `ConstantOrScalableBound` is a thin wrapper over the `AffineMap`
with a utility for converting the bound to a single quantity (i.e. a
size and scalable flag).

We believe this API could prove useful downstream in IREE (which uses a
similar analysis to hoist allocas, which currently fails for scalable
vectors).
NFC.
gfx11_asm_vinterp.s already contained GFX12 run lines. Rename the
assembler and disassembler tests to be sorted based on real16 or fake16
instead of gfxip. Note, both GFX11 and GFX12 currently only have fake16
(fake16 in encoding, but not by name) upstream, so that is why the test
files have a -fake16 suffix.

One test input is changed, and that is the disassembler test for
unsupported bits in the instruction. It is now an input that is valid on
both GFX11 and GFX12. This was necessary because the size of the opcode
field changed.
NFC.
Fix CHECK lines that seem to have a copy paste error.
Move the test that was formerly in gfx12_dasm_vinterp.txt (see llvm#85949).
Check for all frame instructions in finalize isel, not just for the
frame setup opcode. This was proven necessary, see llvm#78001 
for discussion.
Document expectations for contributions to InstCombine, especially
regarding test coverage and alive2 proofs.
…plementation (llvm#86125)

Allow targets to rely on TargetLowering::isGuaranteedNotToBeUndefOrPoisonForTargetNode to test nodes for canCreateUndefOrPoisonForTargetNode + all arguments are isGuaranteedNotToBeUndefOrPoison.

Targets can still perform this themselves for specific special case nodes (e.g. target shuffles).

Matches the fallback in SelectionDAG::isGuaranteedNotToBeUndefOrPoison
…x on RV32. (llvm#85871)

I believe we can use XLen alignment as long as eliminateFrameIndex
limits the maximum folded offset to 2043. This way when we split
the load/store into two 2 instructions we'll be able to add 4
without overflowing simm12.
Adds stdio.h's rename() function as defined in n3096. Fixes  llvm#84980.
…STALL_MODULES_DIR (llvm#86020)

This reapplies 272d1b4 (from llvm#85756),
which was reverted in
4079370.

In the previous attempt, empty CMAKE_INSTALL_PREFIX was handled by
quoting them, in d209d13. That made the
calls to cmake_path(ABSOLUTE_PATH) succeed, but the output paths of that
weren't actually absolute, which was required by file(RELATIVE_PATH).

Avoid this issue by constructing a non-empty base directory variable
to use for calculating the relative path.
…m#83773)

Dwarf 5 allows separating filenames from .debug_line into a separate
.debug_line_str section. The strings are referenced relative to the
start of the .debug_line_str section. Previously, on COFF, the
relocation information instead caused offsets to be relocated to the
base address of the COFF-File. This lead  to wrong offsets in linked
COFF (PE) files which caused the debugger to be unable to find the
correct source files.

This patch fixes this problem by making the offsets relative to the
start of the .debug_line_str section instead. There should be no
changes for ELF-Files as everything seems to be working there.

A test is also added to ensure that the correct relocation entries are
emitted.
This is another step towards supporting DWARF5 checksums and inline
source code in LLDB. This is a reland of llvm#85468 but without the
functional change of storing the support file from the line table (yet).
The range for these operations is being constructed without the
maximum value for the range due to an incorrect usage of the
ConstantRange constructor.

This causes Float2Int to think the range for 'uitofp i1' only
contains 0 instead of 0 and 1.
Investigate the lowering of MemRef Load/Store ops and implement
additional folding of created ops

Aims to improve readability of generated lowered SPIR-V code.

Part of work llvm#70704
…lvm#86002)

The Doxygen comments for the `details` field of a progress report
currently does not specify that this field will act as the initial set
of details for a progress report that gets updated with
`Progress::Increment()`. This commit clarifies this.
…egardless of Zfa. (llvm#85982)

Previously we used BuildPairF64 and SplitF64 only if Zfa was supported
since they will select register file moves that are only available with
Zfa.

We recently changed the handling of BuildPairF64/SplitF64 for Zdinx to
not go through memory so we should use that for bitcast.

That leaves the D without Zfa case that does need to go through memory.
Previously we let type legalization expand to loads and stores using a
new stack temporary created for each bitcast. After this patch we will
create the loads ands stores in the custom inserter and share the same
stack slot for all. This also allows DAGCombiner to optimize when
bitcast is mixed with BuildPairF64/SplitF64.
Some header guards conflicted with clang. Fix a few others to follow the
convention in the rest of the headers in flang.
…m#86041)

We were passing the min and max values of the range to the ConstantRange
constructor, but the constructor expects the upper bound to 1 more than
the max value so we need to add 1.

We also need to use getNonEmpty so that passing 0, 0 to the constructor
creates a full range rather than an empty range. And passing smin,
smax+1 doesn't cause an assertion.

I believe this fixes at least some of the reason llvm#79158 was reverted.
…llvm#83027)

According to POSIX 2018.

1. lineptr, n and stream can not be NULL.
2. If *n is non-zero, *lineptr must point to a region of at least *n
   bytes, or be a NULL pointer.

Additionally, if *lineptr is not NULL, *n must not be undefined.
This change completes llvm#86155
- `DXIL.td` - lowering `fabs` intrinsic to the float dxil op.
- `DXILIntrinsicExpansion.cpp` - Add intrinsic expansion for the abs
case.
Completes llvm#86170
Completes llvm#86172
- `DXIL.td` - Add changes to lower the cosine and floor intrinsics to
dxilOps.
There are a bunch of related directories in another dialects.
Made by following:
llvm#83585 (comment)

Thanks for the details Tomek!

CPP-5080
SLP should be doing a better job, but both shuffles lower to poorer codegen than necessary
…read-safe code (llvm#80813)

**Description**

The documentation of `transform.structured.tile_using_forall` says:

_"It is the user’s responsibility to ensure that num_threads/tile_sizes
is a valid tiling specification (i.e. that only tiles parallel
dimensions, e.g. in the Linalg case)."_

In other words, tiling a non-parallel dimension would generate code with
data races which is not safe to parallelize. For example, consider this
example (included in the tests in this PR):

```
func.func @tile_thread_safety2(%arg0: tensor<100x300x8xf32>, %arg1: tensor<300x8xf32>) -> tensor<300x8xf32> {
  %0 = scf.forall (%arg2) in (8) shared_outs(%arg3 = %arg1) -> (tensor<300x8xf32>) {
    %1 = affine.min #map(%arg2)
    %2 = affine.max #map1(%1)
    %3 = affine.apply #map2(%arg2)
    %extracted_slice = tensor.extract_slice %arg0[%3, 0, 0] [%2, 300, 8] [1, 1, 1] : tensor<100x300x8xf32> to tensor<?x300x8xf32>
    %4 = linalg.generic {indexing_maps = [#map3, #map4], iterator_types = ["reduction", "parallel", "parallel"]} ins(%extracted_slice : tensor<?x300x8xf32>) outs(%arg3 : tensor<300x8xf32>) {
    ^bb0(%in: f32, %out: f32):
      %5 = arith.addf %in, %out : f32
      linalg.yield %5 : f32
    } -> tensor<300x8xf32>
    scf.forall.in_parallel {
      tensor.parallel_insert_slice %4 into %arg3[0, 0] [300, 8] [1, 1] : tensor<300x8xf32> into tensor<300x8xf32>
    }
  }
  return %0 : tensor<300x8xf32>
}
```

We can easily see that this is not safe to parallelize because all
threads would be writing to the same position in `%arg3` (in the
`scf.forall.in_parallel`.

This PR detects wether it's safe to `tile_using_forall` and emits a
warning in the case it is not.

**Brief explanation**
It first generates a vector of affine expressions representing the tile
values and stores it in `dimExprs`. These affine expressions are
compared with the affine expressions coming from the results of the
affine map of each output in the linalg op. So going back to the
previous example, the original transform is:

```
#map = affine_map<(d0, d1, d2) -> (d0, d1, d2)>
#map1 = affine_map<(d0, d1, d2) -> (d1, d2)>

func.func @tile_thread_safety2(%arg0: tensor<100x300x8xf32>, %arg1: tensor<300x8xf32>) -> tensor<300x8xf32> {
  // expected-warning@+1 {{tiling is not thread safe at axis #0}}
  %0 = linalg.generic {indexing_maps = [#map, #map1], iterator_types = ["reduction", "parallel", "parallel"]} ins(%arg0 : tensor<100x300x8xf32>) outs(%arg1 : tensor<300x8xf32>) {
  ^bb0(%in: f32, %out: f32):
    %1 = arith.addf %in, %out : f32
    linalg.yield %1 : f32
  } -> tensor<300x8xf32>
  return %0 : tensor<300x8xf32>
}

module attributes {transform.with_named_sequence} {
  transform.named_sequence @__transform_main(%arg0: !transform.any_op {transform.readonly}) {
    %0 = transform.structured.match ops{["linalg.generic"]} in %arg0 : (!transform.any_op) -> !transform.any_op
    %forall, %tiled_generic = transform.structured.tile_using_forall %0 num_threads [8]
          : (!transform.any_op) -> (!transform.any_op, !transform.any_op)
    transform.yield
  }
}
```

The `num_threads` attribute would be represented as `(d0)`. Because the
linalg op has only one output (`arg1`) it would only check against the
results of `#map1`, which are `(d1, d2)`. The idea is to check that all
affine expressions in `dimExprs` are present in the output affine map.
In this example, `d0` is not in `(d1, d2)`, so tiling that axis is
considered not thread safe.
…ry (llvm#86217)

Summary:
The original intention of the `openmp-add-rpath` option was to add the
rpath to the language runtime directory. However, the current
implementation only adds it to the compiler's resource directory. This
patch adds support for appending the `-rpath` to the compiler's standard
library directory as well. Currently this is `<exe>/../lib/<triple>`.
…nfoFormat

(drop additional 'r' before Format)
This was changed by llvm#84765 but turned out to be buggy. Since it isn't
used and isn't tested, it is probably best to remove it.
…#85620)

Recently llvm#84765 made the split markers of various tools configurable but
did not test *not* using the split markers for two of them. This PR adds
those tests.
…lent of) a C-style cast) (llvm#85263)

Implement [LWG3528](https://wg21.link/LWG3528).
Based on LWG3528(https://wg21.link/LWG3528) and
http://eel.is/c++draft/description#structure.requirements-9, the
standard allows to impose requirements, we constraint
`std::make_from_tuple` to make `std::make_from_tuple` SFINAE friendly
and also avoid worse diagnostic messages. We still keep the constraints
of `std::__make_from_tuple_impl` so that `std::__make_from_tuple_impl`
will have the same advantages when used alone.

---------

Signed-off-by: yronglin <[email protected]>
…d llvm-lto2 (llvm#86271)

Directly load all bitcode into the new debug info format in `llvm-lto`
and `llvm-lto2`. This means that new-mode bitcode no longer round-trips
back to old-mode after parsing, and that old-mode bitcode gets
auto-upgraded to new-mode debug info (which is the current in-memory
default in LLVM).
…ion. (llvm#86150)

The option to specify the style of alignment of the colons inside TableGen's DAGArg.
)

Since ARM64EC/X objects use regular ARM64 relocations, any special
handling must be done for them too.
Our existing diagnostics for catching unsequenced modifications handles
test coverage for N1282, which is correcting the standard based on the
resolution of DR087.
…ectives (llvm#84349)

This PR refactors bounds offsetting by combining the two differing
implementations (one applying to initial derived type member map
implementation for descriptors and the other for regular arrays,
effectively allocatable array vs regular array in fortran) now that it's
a little simpler to do.

The PR also moves the utilization of createAlteredByCaptureMap into
genMapInfoOp, where it will be correctly applied to all MapInfoData,
appropriately offsetting and altering Pointer data set in the kernel
argument structure on the host. This primarily means bounds offsets will
now correctly apply to enter/exit/update map clauses as opposed to just
the Target directive that is currently the case. A few fortran runtime
tests have been added to verify this new behavior.

This PR depends on: llvm#84328 and
is an extraction of the larger derived type member map PR stack (so a
requirement for it to land).
@cferry-AMD cferry-AMD requested a review from cmcgirr-amd August 16, 2024 10:26
Base automatically changed from bump_to_0aa6d57e to feature/fused-ops August 16, 2024 13:57
@cferry-AMD cferry-AMD merged commit 6f96182 into feature/fused-ops Aug 16, 2024
@cferry-AMD cferry-AMD deleted the bump_to_8612fa0d branch August 16, 2024 14:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.