forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 4
Bump to fe2119a7b08b with conflict resolution (1) #258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
mgehre-amd
merged 660 commits into
feature/fused-ops
from
matthias.bump_to_fe2119a7b08b
Aug 15, 2024
Merged
Bump to fe2119a7b08b with conflict resolution (1) #258
mgehre-amd
merged 660 commits into
feature/fused-ops
from
matthias.bump_to_fe2119a7b08b
Aug 15, 2024
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…lvm#84962) This PR includes an initial scheduler model shows improvement on multiple workloads over NoSchedModel and SiFive7Model for sifive-p670. We plan on making significant changes to this model in the future so that it is more accurate. This patch would close llvm#80612.
CI checks were passing in llvm#84962 (c48d818) but that commit caused failures once merged due to ships passing since the PR was not rebased on llvm#85131. This commit fixes this problem by adding sched resources for integer min max instructions from Zbb in P600 model.
We're seeing an issue on Macs, which shouldn't be using this config, so we will temporarily disable this while we investigate.
…ciation. (llvm#85486) Since we are introducing new multiplies earlier in the arithmetic, the nsw/nuw flags on later instructions are no longer accurate. Fixes llvm#85457.
This patch fixes: lld/MachO/ObjC.cpp:633:12: error: unused variable 'expectedListSize' [-Werror,-Wunused-variable] lld/MachO/ObjC.cpp:1034:12: error: unused variable 'newCatDef' [-Werror,-Wunused-variable]
This adds an API call ompx_dump_mapping_tables. This allows users to debug the mapping tables and can be especially useful for unified shared memory applications to check if the code behaves in the way it should. The implementation reuses code already present to dump mapping tables (in a debug setting). --------- Co-authored-by: Joseph Huber <[email protected]>
Instead of passing VPlan in a number of places, just store it directly in VPRecipeBuilder. A single instance is only used for a single VPlan. This simplifies the code and was suggested by @nikolaypanchenko in llvm#84464.
This reverts commit e419084. Likely cause of buildbot failure: https://lab.llvm.org/buildbot/#/builders/179/builds/9629
…vm#85458) Setting GCC_INSTALL_PREFIX leads to a warning (llvm#77537). Link: https://discourse.llvm.org/t/add-gcc-install-dir-deprecate-gcc-toolchain-and-remove-gcc-install-prefix/65091 Link: https://discourse.llvm.org/t/correct-cmake-parameters-for-building-clang-and-lld-for-riscv/72833
… with `unreachable` Since some of the users of `CodeExtractor` like `HotColdSplitting` run late in the pipeline, returns are not cleaned to `unreachable`. So, just emit `unreachable` directly if the function is `noreturn`. Closes llvm#84682
We already handle the `+x` case, and noticed it was missing in the bug affecting llvm#82555 Proofs: https://alive2.llvm.org/ce/z/WUSvmV Closes llvm#85345
As an extension to llvm#84751, this adds some extra uses of beforeOrAfterPointer() instead of UnknownSize.
This patch yields small speed-ups in compiler build and execution times, but more importantly, reduces the stack depth needed in a build environment where tail call optimization does not appear to occur.
The folding of the SCALE() intrinsic function is implemented via multiplication by a power of two; this simplifies handling of exceptional cases. But sometimes scaling by a power of two requires an exponent larger or smaller than a floating-point format can represent, and two multiplications are required.
…lvm#85587) Excess hexadecimal digits were too significant for rounding purposes, leading to inappropriate rounding away from zero for some modes.
There's several symbol attributes that cannot be applied to named constants, but that weren't being checked.
Replace a pointer that should never be null with a reference argument so that it's always defined. Fixes llvm#85615.
Reland of llvm#84991 A downstream overlay mode user ran into issues with the isnan macro not working in our sources with a specific libc configuration. This patch replaces the last direct includes of math.h with our internal math_macros.h, along with the necessary build system changes.
- Instead of lowering float/double ISD::ATOMIC_LOAD / ISD::ATOMIC_STORE nodes to regular LOAD/STORE nodes, make them legal and select those nodes properly instead. This avoids exposing them to the DAGCombiner. - AtomicExpand pass no longer casts float/double atomic load/stores to integer (FP128 is still casted).
The code that prints ValueObjects is duplicated across two different cases of the dwim-print command, and a subsequent commit will add a third case. As such, this commit factors out the common code into a lambda. A free function was considered, but there is too much function-local context required in that. We also reword some of the comments so that they stop counting cases, making it easier to add other cases later.
This lambda does not capture anything, the `&` is just misleading.
…_AT_specification (llvm#85485) According to the DWARF spec a DIE that has DW_AT_specification or DW_AT_abstract_origin can be part of .debug_name if a DIE those attribute points to has DW_AT_name or DW_AT_linkage_name.
this DCHECK was not valid for hwasan_load1, and was not necessary for the. the function is written without any assumptions of alignment of the pointer.
## Abstract This pull request removes the `__workaround_52970` concept. This concept is a workaround for a bug described in llvm#52970, which causes the compiler to trigger ADL on a pointer to an incomplete type in an SFINAE context. This bug is fixed in Clang 14. ## Reference - [[clang] Don't typo-fix an expression in a SFINAE context](https://reviews.llvm.org/D117603) - [[libc++] [ranges] ADL-proof the [range.access] CPOs.](https://reviews.llvm.org/D116239)
Authored-by: Pravin Jagtap <[email protected]>
…lvm#84405) We would like the resolver to be generated eagerly, even if the versioned function is not called from the current translation unit. Fixes llvm#81494. It further allows Multi Versioning to work even if the default target version attribute is omitted from function declarations.
…m#84545) This fixes the following failure when doing a clean build (in particular no .ninja* lying around) of lib/libMLIRLinalgToStandard.a only: ``` In file included from mlir/include/mlir/Dialect/Vector/Transforms/VectorTransforms.h:12, from mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h:21, from mlir/lib/Conversion/LinalgToStandard/LinalgToStandard.cpp:15: mlir/include/mlir/Dialect/Vector/Transforms/VectorRewritePatterns.h:20:10: fatal error: mlir/Dialect/Vector/Transforms/VectorTransformsEnums.h.inc: No such file or directory ```
Most FIR passes only look for FIR operations inside of functions (either because they run only on func.func or they run on the module but iterate over functions internally). But there can also be FIR operations inside of fir.global, some OpenMP and OpenACC container operations. This has worked so far for fir.global and OpenMP reductions because they only contained very simple FIR code which doesn't need most passes to be lowered into LLVM IR. I am not sure how OpenACC works. In the long run, I hope to see a more systematic approach to making sure that every pass runs on all of these container operations. I will write an RFC for this soon. In the meantime, this pass duplicates the CFG conversion pass to also run on omp reduction operations. This is similar to how the AbstractResult pass is already duplicated for fir.global operations. OpenMP array reductions 2/6 Previous PR: llvm#84952 Next PR: llvm#84954 --------- Co-authored-by: Mats Petersson <[email protected]>
…84954) OpenMP reduction declare operations can contain FIR code which needs to be lowered to LLVM. With array reductions, these regions can contain more complicated operations which need PreCGRewriting. A similar extra case was already needed for fir::GlobalOp. OpenMP array reductions 3/6 Previous PR: llvm#84953 Next PR: llvm#84955
As part of the migration to ptradd (https://discourse.llvm.org/t/rfc-replacing-getelementptr-with-ptradd/68699), we need to change the representation of the `inrange` attribute, which is used for vtable splitting. Currently, inrange is specified as follows: ``` getelementptr inbounds ({ [4 x ptr], [4 x ptr] }, ptr @vt, i64 0, inrange i32 1, i64 2) ``` The `inrange` is placed on a GEP index, and all accesses must be "in range" of that index. The new representation is as follows: ``` getelementptr inbounds inrange(-16, 16) ({ [4 x ptr], [4 x ptr] }, ptr @vt, i64 0, i32 1, i64 2) ``` This specifies which offsets are "in range" of the GEP result. The new representation will continue working when canonicalizing to ptradd representation: ``` getelementptr inbounds inrange(-16, 16) (i8, ptr @vt, i64 48) ``` The inrange offsets are relative to the return value of the GEP. An alternative design could make them relative to the source pointer instead. The result-relative format was chosen on the off-chance that we want to extend support to non-constant GEPs in the future, in which case this variant is more expressive. This implementation "upgrades" the old inrange representation in bitcode by simply dropping it. This is a very niche feature, and I don't think trying to upgrade it is worthwhile. Let me know if you disagree.
It looks like the mappings for call instructions were forgotten here. This fixes a bug in OpenMP when in-lining a region containing call operations multiple times. OpenMP array reductions 4/6 Previous PR: llvm#84954 Next PR: llvm#84957
…m#85819) - Give the tablegen record for the Real the same name as the tablegen record for the pseudo. This removes all cases where the same instruction name has to be mentioned more than once on the definition line. - Use multiclasses for all Real definitions, to allow suffixes to be added bit by bit, e.g. first _SADDR and then _gfx11. This is a similar approach to the one used in BUFInstructions.td.
This reverts commit 9848fa4. Causes buildbot failures.
…code (llvm#84957) Moving extractSequenceType to FIRType.h so that this can also be used from OpenMP. OpenMP array reductions 5/6 Previous PR: llvm#84955 Next PR: llvm#84958
This fixes the following failure when doing a clean build (in particular no .ninja* lying around) of lib/libMLIRAffineAnalysis.a only: In file included from mlir/lib/Dialect/Affine/Analysis/AffineAnalysis.cpp:20: mlir/include/mlir/Dialect/Func/IR/FuncOps.h:29:10: fatal error: mlir/Dialect/Func/IR/FuncOps.h.inc: No such file or directory
The test added by llvm#83261 has been consistently failing. Mark as UNSUPPORTED just like on x86_64 and aarch64.
…3255) Allows us to indicate that an addressing mode featuring a vscale-relative immediate offset is supported.
This fixes the following failure when doing a clean build (in particular no .ninja* lying around) of lib/libMLIRSPIRVDialect.a only: ``` mlir/lib/Dialect/SPIRV/IR/SPIRVDialect.cpp:17: mlir/include/mlir/Dialect/GPU/IR/CompilationInterfaces.h:120:10: fatal error: mlir/Dialect/GPU/IR/CompilationAttrInterfaces.h.inc: No such file or directory ```
…tion." (llvm#85914) Reverts llvm#84405 In between of passing the precommit tests on github and being merged some change (perhaps in the AArch64 backend?) landed which resulted in altering the generated resolver. I will regenerate the tests perhaps using a less sensitive runline to such changes.
These traits can be expressed without having to instantiate any classes, reducing compile times slightly.
…ate (llvm#84173) Adds a second parameter (default to 0) to isLegalAddImmediate, to represent a scalable immediate. Extends the AArch64 implementation to match immediates based on what addvl and inc[h|w|d] support.
This has been tested with arrays with compile-time constant bounds. Allocatable arrays and arrays with non-constant bounds are not yet supported. User-defined reduction functions are also not yet supported. The design is intended to work for arrays with non-constant bounds too without a lot of extra work (mostly there are bugs in OpenMPIRBuilder I haven't fixed yet). We need some way to get these runtime bounds into the reduction init and combiner regions. To keep things simple for now I opted to always box the array arguments so the box can be passed as one argument and the lower bounds and extents read from the box. This has the disadvantage of resulting in fir.box_dim operations inside of the critical section. If these prove to be a performance issue, we could follow OpenACC reading box lower bounds and extents before the reduction and passing them as block arguments to the reduction init and combiner regions. I would prefer to keep things simple for now. Note: this implementation only works when the HLFIR lowering is used. I don't think it is worth supporting FIR-only lowering because the plan is for that to be removed soon. OpenMP array reductions 6/6 Previous PR: llvm#84957
…the comparison This makes no real difference currently as we only fold unary shuffles, but I'm hoping to handle binary shuffles in a future patch.
cferry-AMD
approved these changes
Aug 15, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.