forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 76
merge amd-staging into amd-feature/wave-transform #613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
cdevadas
wants to merge
1,093
commits into
amd-feature/wave-transform
from
amd/dev/cdevadas/wave-transform/merge-from-stg-nov-11
Closed
merge amd-staging into amd-feature/wave-transform #613
cdevadas
wants to merge
1,093
commits into
amd-feature/wave-transform
from
amd/dev/cdevadas/wave-transform/merge-from-stg-nov-11
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The main improvement is to the mfma tests. There are some mild regressions scattered around, and a few major ones. The worst regressions are in some of the bitcast tests; these are cases where the SGPR argument list runs out and uses VGPRs, and the copies-from-VGPR are misidentified as divergent. Most of the shufflevector tests are also regressions. These end up with cleaner MIR, but then get poor regalloc decisions.
Implement support for the OffsetOfExpr
Upstream ExtVectorElementExpr with result Vector type
…#166724) Per [LWG554](https://cplusplus.github.io/LWG/issue554), the rationale is that even if `true / false` traps, the values causing trap are the converted `int` values produced by usual arithmetic conversion, but not the original `bool` values. This is also true for all other non-promoted integer types. As a result, `std::numeric_limits<I>` should be `false` if `I` is a non non-promoted integer type. Fixes llvm#166053.
…llvm#165779) (llvm#168034) Refer to llvm#158276 for previous hotfix. In Z3, boolean expressions are incompatible with bitvec operators. However, C expressions like `-(5 && a)` will generate such symbolic expressions, which will be further used as an integer. To be compatible with such usages, this fix converts such expressions to integer using the existing `fromCast`.
Update test to capture unnamed VPValues in variables, making it easier to update with future VPlan changes.
…n. (llvm#167965) Extend willNotFreeBetween to perform simple checking across blocks to support the case where CtxI is in a successor of the block that contains the assume, but the assume's parent is the single predecessor of CtxI's block. This enables using _builtin_assume_dereferenceable to vectorize std::find_if and co in practice. End-to-end reproducer: https://godbolt.org/z/6jbsd4EjT PR: llvm#167965
… __builtin_elementwise_sqrt (llvm#168057) Followup to llvm#165682
…168084) In llvm#165748 constant expressions were allowed in `collectPossibleValues` because we are still using insertelement + shufflevector idioms to represent a scalable vector splat. However, it also accepts some unresolved constants like ptrtoint of globals or pointer difference between two globals. Absolutely we can ask the user to check this case with the constant folding API. However, since we don't observe the real-world usefulness of handling constant expressions, I decide to be more conservative and only handle immediate constants in the helper function. With this patch, we don't need to touch the SimplifyCFG part, as the values can only be either ConstantInt or undef/poison values (NB: switch on undef condition is UB). Fix the miscompilation reported by llvm#165748 (comment)
These tests were only checking the specialized prefix, leaving common code unchecked (and incorrect). Checked code was also not using patterns for SSA values.
Construct SCEVs for VPWidenIntOrFpInductionRecipe analogous to VPCanonicalInductionPHIRecipe: create an AddRec with start + step from the recipe. Currently the only impact should be computing more costs of replicating stores directly in VPlan.
…lvm#167918) Use clang linker wrapper to device-link and embed HIP fat binary directly. Match CUDA non-RDC flow in new driver by producing .hipfb like .fatbin. Previously, llvm offload binary is used to package the device IR's and embed them in the host object file, then clang linker wrapper is used with each host object file to extract device IR's, perform device linking, bundle code objects into a fat binary, wrap it in a host object file, then merge it with the original host object by the host linker with '-r' option. However, the host linker in MSVC toolchain does not support '-r' option. The new approach still package the device IR's with llvm offload binary, but instead of embed it in a host object, it is passed to clang linker wrapper directly, where device IR's are extracted and linked, fat binary is generated, then embeded in the host object directly. Compared with the old offload driver, this approach can parallelize the device linking for different GPU's by using the parallelization feature of clang linker wrapper. Fixes: SWDEV-565994
Only check up to CtxI (CtxIter) when checking for calls that may free in CtxI's block. Missed update in llvm#167965. This should be NFC, as all current callers pass a terminator that is guaranteed to not free as CtxI
hopefully resolves oclConformance fails
…7736) As in title. AVX10.x doesn't distinguish between available vector lengths. -mattr=avx10.x-512 and defining of macros with _512 is kept for compatibility. Bit-positions of avx10.1/2 features in compiler-rt and X86TargetParser are synced to match those in the gcc.
…lvm#168128) When shrinking and/or to bitset* remove leftover implicit scc def. bitset* instructions do not set scc. Signed-off-by: John Lu <[email protected]>
The section headers present in the DBI stream got lost when using `pdb2yaml` and `yaml2pdb`. They are a list of COFF section headers. The `llvm::object::coff_section` didn't have a YAML mapping, so I added one in llvm-pdbutil. The mapping for COFF sections in ObjectYAML includes the section data itself, so we can't use it here. Creation of the section map and headers in yaml2pdb is done like in LLD: https://github.com/llvm/llvm-project/blob/438a18c1e105ca04e624239644195e48b28b5099/lld/COFF/PDB.cpp#L1695-L1703
This adds additional test coverage for folding FCMP uno (llvm#166823)
Identified with bugprone-unused-local-non-trivial-variable.
Identified with llvm-use-ranges.
Identified with readability-delete-null-pointer.
NumElts is alreadyof type int. Identified with readability-redundant-casting.
This patch is limited to single-word replacements to fix spelling and/or grammar to ease the review process. Punctuation and markdown fixes are specifically excluded.
Simplifies some tests which no do not need to pass TC, and future changes will require to always have a trip count available.
Replace addMetadata with setMetadata, which sets metadata, updating existing entries or adding a new entry otherwise. This isn't strictly needed at the moment, but will be needed for follow-up patches.
…vm#167065) Otherwise, we end up using whatever system-provided compiler runtime is available, which doesn't work on macOS since compiler-rt is located inside the toolchain path, which can't be found by default. However, disable the tests for compiler-rt since those are linking against the system C++ standard library while using the just-built libc++ headers, which is non-sensical and leads to undefined references on macOS.
… tests (llvm#167346) We want to eliminate all .compile.fail.cpp tests since they are brittle: these tests pass regardless of the specific compilation error, which means that e.g. a mising include will render the test null. This is not an exhaustive pass, just a few tests I stumbled upon.
…#167253) Update VPlan to populate VPIRMetadata during VPInstruction construction and use it when creating widened recipes, instead of constructing VPIRMetadata from the underlying IR instruction each time. This centralizes VPIRMetadata in VPInstructions and ensures metadata is consistently available throughout VPlan transformations. PR: llvm#167253
These are simply implemented as specializations of strtofloatingpoint for double / long double and for wchar_t. The unit tests are copied from the strtod / strtold ones.
…cking safe patterns, if "cond" is a constant (llvm#167989) In `-Wunsafe-buffer-usage`, many safe pattern checks can benefit from constant folding. This commit improves null-terminated pointer checks by folding conditional expressions. rdar://159374822 --------- Co-authored-by: Balázs Benics <[email protected]>
* Adds lowerings for amdgpy.scaled_ext_packed816 * updates verifiers
…167661) The motivation is to allow passes such as MachineLICM to hoist trivial FMOV instructions out of loops, where previously it didn't do so even when the RHS is a constant. On most architectures, these expensive move instructions have a latency of 2-6 cycles, and certainly not cheap as a 0-1 cycle move.
Starting in version 15, GCC emits a `.base64` directive instead of `.string` or `.ascii` for char arrays of length `>= 3`. See [this godbolt link](https://godbolt.org/z/ebhe3oenv) for an example. This patch adds support for the .base64 directive to AsmParser.cpp, so tools like `llvm-mc` can process the output of GCC more effectively. This addresses llvm#165499.
…lvm#167981) During the initialization sequence in our tests the first 'threads' response sould only be kept if the process is actually stopped, otherwise we will have stale data. In VSCode, during the debug session startup sequence immediately after 'configurationDone' a 'threads' request is made. This initial request is to retrieve the main threads name and id so the UI can be populated. However, in our tests we do not want to cache this value unless the process is actually stopped. We do need to make this initial request because lldb-dap is caching the initial thread list during configurationDone before the process is resumed. We need to make this call to ensure the cached initial threads are purged. I noticed this in a CI job for another review (https://github.com/llvm/llvm-project/actions/runs/19348261989/job/55353961798) where the tests incorrectly failed to fetch the threads prior to validating the thread names.
There is an extra underscore in build_type param in llvm#167583 patch. Fixing it in this PR.
…lvm#168433) This change adds the ACCImplicitRoutine pass which implements the OpenACC specification for implicit routine directives (OpenACC 3.4 spec, section 2.15.1). According to the specification: "If no explicit routine directive applies to a procedure whose definition appears in the program unit being compiled, then the implementation applies an implicit routine directive to that procedure if any of the following conditions holds: The procedure is called or its address is accessed in a compute region." The pass automatically generates `acc.routine` operations for functions called within OpenACC compute constructs or within existing routine functions that do not already have explicit routine directives. It recursively applies implicit routine directives while avoiding infinite recursion when dependencies form cycles. Key features: - Walks through all OpenACC compute constructs (parallel, kernels, serial) to identify function calls - Creates implicit `acc.routine` operations for functions without explicit routine declarations - Recursively processes existing `acc.routine` operations to handle transitive dependencies - Avoids infinite recursion through proper tracking of processed routines - Respects device-type specific bind clauses to skip routines bound to different device types Requirements: - Function operations must implement `mlir::FunctionOpInterface` to be identified and associated with routine directives. - Call operations must implement `mlir::CallOpInterface` to detect function calls and traverse the call graph. - Optionally pre-register `acc::OpenACCSupport` if custom behavior is needed for determining if a symbol use is valid within GPU regions (such as functions which are already considerations for offloading even without `acc routine` markings) Co-authored-by: delaram-talaashrafi<[email protected]>
This allows SDNodes to be validated against their expected type profiles and reduces the number of changes required to add a new node. The validation functionality has detected several issues, see `PPCSelectionDAGInfo::verifyTargetNode()`. Most of the nodes have a description in `*.td` files and were successfully "imported". Those that don't have a description are listed in the enum in `PPCSelectionDAGInfo.td`. These nodes are not validated. Part of llvm#119709. Pull Request: llvm#168108
We build the callsite graph by first adding nodes and edges for all allocation contexts, then match the interior callsite nodes onto actual calls (IR or summary), which due to inlining may result in the generation of new nodes representing the inlined context sequence. We attempt to update edges correctly during this process, but in the case of recursion this becomes impossible to always get correct. Specifically, when creating new inlined sequence nodes for stack ids on recursive cycles we can't always update correctly, because we have lost the original ordering of the context. This PR introduces a mechanism, guarded by -memprof-top-n-important= flag, to keep track of extra information for the largest N cold contexts. Another flag -memprof-fixup-important (enabled by default) will perform more expensive fixup of the edges for those largest N cold contexts, by saving and walking the original ordered list of stack ids from the context.
Collaborator
|
PSDB Build Link: http://mlse-bdc-20dd129:8065/#/builders/10/builds/28 |
Author
|
Not needed any more. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.