Skip to content

Conversation

@mgehre-amd
Copy link
Collaborator

No description provided.

michaelmaitland and others added 30 commits March 18, 2024 13:44
…lvm#84962)

This PR includes an initial scheduler model shows improvement on
multiple workloads over NoSchedModel and SiFive7Model for sifive-p670.
We plan on making significant changes to this model in the future so
that it is more accurate. This patch would close
llvm#80612.
CI checks were passing in llvm#84962 (c48d818) but
that commit caused failures once merged due to ships passing since the
PR was not rebased on llvm#85131. This commit fixes this problem by adding
sched resources for integer min max instructions from Zbb in P600 model.
We're seeing an issue on Macs, which shouldn't be using this config, so
we will temporarily disable this while we investigate.
…ciation. (llvm#85486)

Since we are introducing new multiplies earlier in the arithmetic, the
nsw/nuw flags on later instructions are no longer accurate.

Fixes llvm#85457.
This patch fixes:

  lld/MachO/ObjC.cpp:633:12: error: unused variable 'expectedListSize'
  [-Werror,-Wunused-variable]

  lld/MachO/ObjC.cpp:1034:12: error: unused variable 'newCatDef'
  [-Werror,-Wunused-variable]
This adds an API call ompx_dump_mapping_tables.
This allows users to debug the mapping tables and can be especially
useful for unified shared memory applications to check if the code
behaves in the way it should. The implementation reuses code already
present to dump mapping tables (in a debug setting).

---------

Co-authored-by: Joseph Huber <[email protected]>
Instead of passing VPlan in a number of places, just store it directly
in VPRecipeBuilder. A single instance is only used for a single VPlan.

This simplifies the code and was suggested by @nikolaypanchenko in
llvm#84464.
… with `unreachable`

Since some of the users of `CodeExtractor` like `HotColdSplitting` run
late in the pipeline, returns are not cleaned to `unreachable`. So,
just emit `unreachable` directly if the function is `noreturn`.

Closes llvm#84682
We already handle the `+x` case, and noticed it was missing in the bug
affecting llvm#82555

Proofs: https://alive2.llvm.org/ce/z/WUSvmV

Closes llvm#85345
As an extension to llvm#84751, this adds some extra uses of beforeOrAfterPointer()
instead of UnknownSize.
This patch yields small speed-ups in compiler build and execution times,
but more importantly, reduces the stack depth needed in a build
environment where tail call optimization does not appear to occur.
The folding of the SCALE() intrinsic function is implemented via
multiplication by a power of two; this simplifies handling of
exceptional cases. But sometimes scaling by a power of two requires an
exponent larger or smaller than a floating-point format can represent,
and two multiplications are required.
…lvm#85587)

Excess hexadecimal digits were too significant for rounding purposes,
leading to inappropriate rounding away from zero for some modes.
There's several symbol attributes that cannot be applied to named
constants, but that weren't being checked.
Replace a pointer that should never be null with a reference argument so
that it's always defined.

Fixes llvm#85615.
Reland of llvm#84991

A downstream overlay mode user ran into issues with the isnan macro not
working in our sources with a specific libc configuration. This patch
replaces the last direct includes of math.h with our internal
math_macros.h, along with the necessary build system changes.
- Instead of lowering float/double ISD::ATOMIC_LOAD / ISD::ATOMIC_STORE
nodes to regular LOAD/STORE nodes, make them legal and select those nodes
properly instead. This avoids exposing them to the DAGCombiner.

- AtomicExpand pass no longer casts float/double atomic load/stores to integer
  (FP128 is still casted).
The code that prints ValueObjects is duplicated across two different
cases of the dwim-print command, and a subsequent commit will add a
third case. As such, this commit factors out the common code into a
lambda. A free function was considered, but there is too much
function-local context required in that.

We also reword some of the comments so that they stop counting cases,
making it easier to add other cases later.
This lambda does not capture anything, the `&` is just misleading.
…_AT_specification (llvm#85485)

According to the DWARF spec a DIE that has DW_AT_specification or
DW_AT_abstract_origin can be part of .debug_name if a DIE those
attribute points to has DW_AT_name or DW_AT_linkage_name.
this DCHECK was not valid for hwasan_load1, and was not necessary for
the. the function is written without any assumptions of alignment of the
pointer.
xiaoyang-sde and others added 25 commits March 20, 2024 09:49
## Abstract

This pull request removes the `__workaround_52970` concept. This concept
is a workaround for a bug described in llvm#52970, which causes the compiler
to trigger ADL on a pointer to an incomplete type in an SFINAE context.
This bug is fixed in Clang 14.

## Reference

- [[clang] Don't typo-fix an expression in a SFINAE
context](https://reviews.llvm.org/D117603)
- [[libc++] [ranges] ADL-proof the [range.access]
CPOs.](https://reviews.llvm.org/D116239)
…lvm#84405)

We would like the resolver to be generated eagerly, even if the
versioned function is not called from the current translation
unit. Fixes llvm#81494. It further allows Multi Versioning to work
even if the default target version attribute is omitted from
function declarations.
…m#84545)

This fixes the following failure when doing a clean build (in particular
no .ninja* lying around) of lib/libMLIRLinalgToStandard.a only:
```
In file included from mlir/include/mlir/Dialect/Vector/Transforms/VectorTransforms.h:12,
                 from mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h:21,
                 from mlir/lib/Conversion/LinalgToStandard/LinalgToStandard.cpp:15:
mlir/include/mlir/Dialect/Vector/Transforms/VectorRewritePatterns.h:20:10: fatal error: mlir/Dialect/Vector/Transforms/VectorTransformsEnums.h.inc: No such file or directory
```
Most FIR passes only look for FIR operations inside of functions (either
because they run only on func.func or they run on the module but iterate
over functions internally). But there can also be FIR operations inside
of fir.global, some OpenMP and OpenACC container operations.

This has worked so far for fir.global and OpenMP reductions because they
only contained very simple FIR code which doesn't need most passes to be
lowered into LLVM IR. I am not sure how OpenACC works.

In the long run, I hope to see a more systematic approach to making sure
that every pass runs on all of these container operations. I will write
an RFC for this soon.

In the meantime, this pass duplicates the CFG conversion pass to also
run on omp reduction operations. This is similar to how the
AbstractResult pass is already duplicated for fir.global operations.

OpenMP array reductions 2/6
Previous PR: llvm#84952
Next PR: llvm#84954

---------

Co-authored-by: Mats Petersson <[email protected]>
…84954)

OpenMP reduction declare operations can contain FIR code which needs to
be lowered to LLVM. With array reductions, these regions can contain
more complicated operations which need PreCGRewriting. A similar extra
case was already needed for fir::GlobalOp.

OpenMP array reductions 3/6
Previous PR: llvm#84953
Next PR: llvm#84955
As part of the migration to ptradd
(https://discourse.llvm.org/t/rfc-replacing-getelementptr-with-ptradd/68699),
we need to change the representation of the `inrange` attribute, which
is used for vtable splitting.

Currently, inrange is specified as follows:

```
getelementptr inbounds ({ [4 x ptr], [4 x ptr] }, ptr @vt, i64 0, inrange i32 1, i64 2)
```

The `inrange` is placed on a GEP index, and all accesses must be "in
range" of that index. The new representation is as follows:

```
getelementptr inbounds inrange(-16, 16) ({ [4 x ptr], [4 x ptr] }, ptr @vt, i64 0, i32 1, i64 2)
```

This specifies which offsets are "in range" of the GEP result. The new
representation will continue working when canonicalizing to ptradd
representation:

```
getelementptr inbounds inrange(-16, 16) (i8, ptr @vt, i64 48)
```

The inrange offsets are relative to the return value of the GEP. An
alternative design could make them relative to the source pointer
instead. The result-relative format was chosen on the off-chance that we
want to extend support to non-constant GEPs in the future, in which case
this variant is more expressive.

This implementation "upgrades" the old inrange representation in bitcode
by simply dropping it. This is a very niche feature, and I don't think
trying to upgrade it is worthwhile. Let me know if you disagree.
It looks like the mappings for call instructions were forgotten here.
This fixes a bug in OpenMP when in-lining a region containing call
operations multiple times.

OpenMP array reductions 4/6
Previous PR: llvm#84954
Next PR: llvm#84957
…m#85819)

- Give the tablegen record for the Real the same name as the tablegen
  record for the pseudo. This removes all cases where the same
  instruction name has to be mentioned more than once on the definition
  line.
- Use multiclasses for all Real definitions, to allow suffixes to be
  added bit by bit, e.g. first _SADDR and then _gfx11.

This is a similar approach to the one used in BUFInstructions.td.
This reverts commit 9848fa4.

Causes buildbot failures.
…code (llvm#84957)

Moving extractSequenceType to FIRType.h so that this can also be used
from OpenMP.

OpenMP array reductions 5/6
Previous PR: llvm#84955
Next PR: llvm#84958
This fixes the following failure when doing a clean build (in particular
no .ninja* lying around) of lib/libMLIRAffineAnalysis.a only:

In file included from
mlir/lib/Dialect/Affine/Analysis/AffineAnalysis.cpp:20:
mlir/include/mlir/Dialect/Func/IR/FuncOps.h:29:10: fatal error:
mlir/Dialect/Func/IR/FuncOps.h.inc: No such file or directory
The test added by llvm#83261
has been consistently failing.  Mark as UNSUPPORTED just like on
x86_64 and aarch64.
…3255)

Allows us to indicate that an addressing mode featuring a
vscale-relative immediate offset is supported.
This fixes the following failure when doing a clean build (in particular
no .ninja* lying around) of lib/libMLIRSPIRVDialect.a only:
```
mlir/lib/Dialect/SPIRV/IR/SPIRVDialect.cpp:17:
mlir/include/mlir/Dialect/GPU/IR/CompilationInterfaces.h:120:10: fatal error: mlir/Dialect/GPU/IR/CompilationAttrInterfaces.h.inc: No such file or directory
```
…tion." (llvm#85914)

Reverts llvm#84405

In between of passing the precommit tests on github and being merged
some change (perhaps in the AArch64 backend?) landed which resulted
in altering the generated resolver. I will regenerate the tests
perhaps using a less sensitive runline to such changes.
These traits can be expressed without having to instantiate any classes,
reducing compile times slightly.
…ate (llvm#84173)

Adds a second parameter (default to 0) to isLegalAddImmediate, to
represent a scalable immediate.

Extends the AArch64 implementation to match immediates based on what addvl and inc[h|w|d] support.
These were disabled due to ODR violations with mixed versions of
clang-tidy and the clang libraries. This issue was fixed in
e19e860.

This reverts commit 0bbada9.
This has been tested with arrays with compile-time constant bounds.
Allocatable arrays and arrays with non-constant bounds are not yet
supported. User-defined reduction functions are also not yet supported.

The design is intended to work for arrays with non-constant bounds too
without a lot of extra work (mostly there are bugs in OpenMPIRBuilder I
haven't fixed yet).

We need some way to get these runtime bounds into the reduction init and
combiner regions. To keep things simple for now I opted to always box
the array arguments so the box can be passed as one argument and the
lower bounds and extents read from the box. This has the disadvantage of
resulting in fir.box_dim operations inside of the critical section. If
these prove to be a performance issue, we could follow OpenACC reading
box lower bounds and extents before the reduction and passing them as
block arguments to the reduction init and combiner regions. I would
prefer to keep things simple for now.

Note: this implementation only works when the HLFIR lowering is used. I
don't think it is worth supporting FIR-only lowering because the plan is
for that to be removed soon.

OpenMP array reductions 6/6
Previous PR: llvm#84957
…the comparison

This makes no real difference currently as we only fold unary shuffles, but I'm hoping to handle binary shuffles in a future patch.
@mgehre-amd mgehre-amd requested a review from cferry-AMD August 15, 2024 12:47
@mgehre-amd mgehre-amd enabled auto-merge August 15, 2024 13:27
@mgehre-amd mgehre-amd merged commit de8cc8f into feature/fused-ops Aug 15, 2024
@mgehre-amd mgehre-amd deleted the matthias.bump_to_fe2119a7b08b branch August 15, 2024 14:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.