forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 5
[AutoBump] Merge with fixes of 1b2c8f10 (Nov 26) (16) Tosa Changes Integration #480
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
We're currently excluding Wasm.cpp, because it requires emscripten. When using header modules, Wasm.h gets compiled on its own and it also requires emscripten, so we need to exclude both.
…lvm#122244) The motivation for this is to allow us to match strided accesses that are emitted from the loop vectorizer with EVL tail folding (see llvm#122232) In these loops the step isn't loop invariant and is based off of @llvm.experimental.get.vector.length. We can relax this as long as we make sure to construct the updates after the definition inside the loop, instead of the preheader. I presume the restriction was previously added so that the step would dominate the insertion point in the preheader. I can't think of why it wouldn't be safe to calculate it in the loop otherwise.
…lvm#122459) This avoids some of the pending regressions after AMDGPU implements isExtractVecEltCheap. In a case like shl <value, undef>, splat k, because the second operand was fully defined, we would fall through and use the splat value for the first operand, losing the undef high bits. This would result in an additional instruction to handle the high bits. Add some reduced testcases for different opcodes for one of the regressions.
Once again we have excessive TLI hooks with bad defaults. Permit this for 32-bit element vectors, which are just use-different-register. We should permit 16-bit vectors as cheap with legal packed instructions, but I see some mixed improvements and regressions that need investigation.
This reverts commit 9a6433f. ninja check-flang on x86 host fails to compile.
…vm#122672) This avoids regressions in a future AMDGPU commit. Previously we would have a build_vector (extract_vector_elt x), undef with free access to the elements bloated into a shuffle of one element + undef, which has much worse combine support than the extract. Alternatively could check aggressivelyPreferBuildVectorSources, but I'm not sure it's really different than isExtractVecEltCheap.
This showcases a miscompile involving a widened reduction-phi.
AArch64 instructions have a fixed size 4 bytes, no need to compute.
The hdrgen output is C, not C++.
C++11 introduced `noexcept`, but `throw()` can be used in older versions of the language.
…leDeclsByName (llvm#123152) Part for relanding llvm#122887. I split this to test where the performance regession comes from if modules are not used.
…lvm#87474) The proposed patch, in general, tries to transform the below code sequence: x = 1.0 / sqrt (a); r1 = x * x; // same as 1.0 / a r2 = a / sqrt(a); // same as sqrt (a) TO (If x, r1 and r2 are all used further in the code) r1 = 1.0 / a r2 = sqrt (a) x = r1 * r2 The transform tries to make high latency sqrt and div operations independent and also saves on one multiplication. The patch was tested with SPEC17 suite with cpu=neoverse-v2. The performance uplift achieved was: 544.nab_r ~4% No other regressions were observed. Also, no compile time differences were observed with the patch. Closes llvm#54652
Pull Request: llvm#123282
…9218) The intention is to use a "copy" instead of a "sub" to handle the high parts of 64-bit multiply for this specific case. This unlocks copy prop use cases where the copy can be reused by later multiply+add sequences if possible. Fixes: SWDEV-487672, SWDEV-487669
) Close llvm#90154 This patch is also an optimization to the lookup process to utilize the information provided by `export` keyword. Previously, in the lookup process, the `export` keyword only takes part in the check part, it doesn't get involved in the lookup process. That said, previously, in a name lookup for 'name', we would load all of declarations with the name 'name' and check if these declarations are valid or not. It works well. But it is inefficient since it may load declarations that may not be wanted. Note that this patch actually did a trick in the lookup process instead of bring module information to DeclarationName or considering module information when deciding if two declarations are the same. So it may not be a surprise to me if there are missing cases. But it is not a regression. It should be already the case. Issue reports are welcomed. In this patch, I tried to split the big lookup table into a lookup table as before and a module local lookup table, which takes a combination of the ID of the DeclContext and hash value of the primary module name as the key. And refactored `DeclContext::lookup()` method to take the module information. So that a lookup in a DeclContext won't load declarations that are local to **other** modules. And also I think it is already beneficial to split the big lookup table since it may reduce the conflicts during lookups in the hash table. BTW, this patch introduced a **regression** for a reachability rule in C++20 but it was false-negative. See 'clang/test/CXX/module/module.interface/p7.cpp' for details. This patch is not expected to introduce any other regressions for non-c++20-modules users since the module local lookup table should be empty for them.
The code path has been dead since 2019. See a3eb3d3
The libc headers are C, not C++.
This patch fixes: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:13908:46: error: comparison of integers of different signs: 'uint32_t' (aka 'unsigned int') and 'int' [-Werror,-Wsign-compare]
Only used by Unix/Program.inc and seem always available. Pull Request: llvm#123288
When iterating over function records, filtered by file name, currently, the iteration goes over all the function records, repeatedly for each source file, essentially giving quadratic behavior. 413647d sped up some cases by keeping track of the indices of the function records corresponding to each file name. This change expands the use of that map to FunctionRecordIterator. On a test case with Firefox's libxul.so and a 2.5MB profile, this brings down the runtime of `llvm-cov export $lib --instr-profile $prof -t lcov` from 12 minutes with 90% spent in skipOtherFiles to 19 seconds with no samples in skipOtherFiles at all under a sampling profiler (with a sampling interval of 1ms). Fixes llvm#62079
We still have GetDescription and DumpStopContext which serve a similar purpose. (The main reason this is bothering me is because I'm working through the uses of (deprecated) Function::GetAddressRange.)
This is allowed as a GCC extension, see https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html.
…vm#122726) …ecord level. This fixes the incorrect diagnostic emitted when compiling the following snippet ``` // string_view.h template<class _CharT> class basic_string_view; typedef basic_string_view<char> string_view; template<class _CharT> class __attribute__((__preferred_name__(string_view))) basic_string_view { public: basic_string_view() { } }; inline basic_string_view<char> foo() { return basic_string_view<char>(); } // A.cppm module; #include "string_view.h" export module A; // Use.cppm module; #include "string_view.h" export module Use; import A; ``` The diagnostic is ``` string_view.h:11:5: error: 'basic_string_view<char>::basic_string_view' from module 'A.<global>' is not present in definition of 'string_view' provided earlier ``` The underlying issue is that deserialization of the `preferred_name` attribute triggers deserialization of `basic_string_view<char>`, which triggers the deserialization of the `preferred_name` attribute again (since it's attached to the `basic_string_view` template). The deserialization logic is implemented in a way that prevents it from going on a loop in a literal sense (it detects early on that it has already seen the `string_view` typedef when trying to start its deserialization for the second time), but leaves the typedef deserialization in an unfinished state. Subsequently, the `string_view` typedef from the deserialized module cannot be merged with the same typedef from `string_view.h`, resulting in the above diagnostic. This PR resolves the problem by delaying the deserialization of the `preferred_name` attribute until the deserialization of the `basic_string_view` template is completed. As a result of deferring, the deserialization of the `preferred_name` attribute doesn't need to go on a loop since the type of the `string_view` typedef is already known when it's deserialized.
When libomp is built with -cf-protection, add endbr instructions to the start of functions for Intel CET support.
This fixes a number of issues introduced in llvm#97130 when LLVM_LIBDIR_SUFFIX is a non-empty string. Make sure that the libdir is always referenced as `lib${LLVM_LIBDIR_SUFFIX}`, not as just `lib` or `${CMAKE_INSTALL_LIBDIR}${LLVM_LIBDIR_SUFFIX}`. This is the standard libdir convention for all LLVM subprojects. Using `${CMAKE_INSTALL_LIBDIR}${LLVM_LIBDIR_SUFFIX}` would result in a duplicate suffix.
…d to targetShrinkDemandedConstant is not 32 or 64 (llvm#123084) See llvm#123029 for details.
[AutoBump] Merge with d0b641b (Jan 14) (40)
[AutoBump] Merge with fixes of be96bd7 (Jan 14) (41) [Only tested MLIR]
[AutoBump] Merge with 31249e2 (Jan 14) (42)
[AutoBump] Merge with fixes of f09db6a (Jan 14) (43) [Only tested MLIR]
[AutoBump] Merge with 3986cff (Jan 15) (44)
[AutoBump] Merge with 1181921 (Jan 17) (48)
[AutoBump] Merge with fixes of 0bd0765 (Jan 17) (49) [Only tested MLIR]
[AutoBump] Merge with e240261 (Jan 17) (52)
[AutoBump] Merge with fixes of 392622d (Dec 09) (22)[Only tested MLIR][New dependency]
[AutoBump] Merge with fixes of d28a4f1 (Jan 17) (51) [Only tested MLIR]
[AutoBump] Merge with fixes of f9a8006 (Jan 15) (47) [Only tested MLIR]
[AutoBump] Merge with fixes of 7402521 (Jan 15) (45) [Reverted]
[AutoBump] Merge with 0195ec4 (Jan 15) (46)
…s set, as it causes options be registered more than once
…535ea8eb18 to LLVM
Use the emitc-provided function to check the types instead of checking for float types, as the other arith lowering do.
Otherwise we get hundreds of warnings during cmake.
Tosa Changes Integration
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Notable commits:
7402521 caused passes to not be registered, reverted in 528b284
llvm#119408 caused options to be registered more than once, reverted in 51065a3