[AutoBump] Merge with fixes of 1b2c8f10 (Nov 26) (16) Tosa Changes Integration #480

jorickert · 2025-02-20T17:02:06Z

Notable commits:
7402521 caused passes to not be registered, reverted in 528b284

llvm#119408 caused options to be registered more than once, reverted in 51065a3

We're currently excluding Wasm.cpp, because it requires emscripten. When using header modules, Wasm.h gets compiled on its own and it also requires emscripten, so we need to exclude both.

…lvm#122244) The motivation for this is to allow us to match strided accesses that are emitted from the loop vectorizer with EVL tail folding (see llvm#122232) In these loops the step isn't loop invariant and is based off of @llvm.experimental.get.vector.length. We can relax this as long as we make sure to construct the updates after the definition inside the loop, instead of the preheader. I presume the restriction was previously added so that the step would dominate the insertion point in the preheader. I can't think of why it wouldn't be safe to calculate it in the loop otherwise.

…lvm#122459) This avoids some of the pending regressions after AMDGPU implements isExtractVecEltCheap. In a case like shl <value, undef>, splat k, because the second operand was fully defined, we would fall through and use the splat value for the first operand, losing the undef high bits. This would result in an additional instruction to handle the high bits. Add some reduced testcases for different opcodes for one of the regressions.

Once again we have excessive TLI hooks with bad defaults. Permit this for 32-bit element vectors, which are just use-different-register. We should permit 16-bit vectors as cheap with legal packed instructions, but I see some mixed improvements and regressions that need investigation.

This reverts commit 9a6433f. ninja check-flang on x86 host fails to compile.

…vm#122672) This avoids regressions in a future AMDGPU commit. Previously we would have a build_vector (extract_vector_elt x), undef with free access to the elements bloated into a shuffle of one element + undef, which has much worse combine support than the extract. Alternatively could check aggressivelyPreferBuildVectorSources, but I'm not sure it's really different than isExtractVecEltCheap.

This showcases a miscompile involving a widened reduction-phi.

AArch64 instructions have a fixed size 4 bytes, no need to compute.

The hdrgen output is C, not C++.

C++11 introduced `noexcept`, but `throw()` can be used in older versions of the language.

…leDeclsByName (llvm#123152) Part for relanding llvm#122887. I split this to test where the performance regession comes from if modules are not used.

…lvm#87474) The proposed patch, in general, tries to transform the below code sequence: x = 1.0 / sqrt (a); r1 = x * x; // same as 1.0 / a r2 = a / sqrt(a); // same as sqrt (a) TO (If x, r1 and r2 are all used further in the code) r1 = 1.0 / a r2 = sqrt (a) x = r1 * r2 The transform tries to make high latency sqrt and div operations independent and also saves on one multiplication. The patch was tested with SPEC17 suite with cpu=neoverse-v2. The performance uplift achieved was: 544.nab_r ~4% No other regressions were observed. Also, no compile time differences were observed with the patch. Closes llvm#54652

Pull Request: llvm#123282

…9218) The intention is to use a "copy" instead of a "sub" to handle the high parts of 64-bit multiply for this specific case. This unlocks copy prop use cases where the copy can be reused by later multiply+add sequences if possible. Fixes: SWDEV-487672, SWDEV-487669

) Close llvm#90154 This patch is also an optimization to the lookup process to utilize the information provided by `export` keyword. Previously, in the lookup process, the `export` keyword only takes part in the check part, it doesn't get involved in the lookup process. That said, previously, in a name lookup for 'name', we would load all of declarations with the name 'name' and check if these declarations are valid or not. It works well. But it is inefficient since it may load declarations that may not be wanted. Note that this patch actually did a trick in the lookup process instead of bring module information to DeclarationName or considering module information when deciding if two declarations are the same. So it may not be a surprise to me if there are missing cases. But it is not a regression. It should be already the case. Issue reports are welcomed. In this patch, I tried to split the big lookup table into a lookup table as before and a module local lookup table, which takes a combination of the ID of the DeclContext and hash value of the primary module name as the key. And refactored `DeclContext::lookup()` method to take the module information. So that a lookup in a DeclContext won't load declarations that are local to **other** modules. And also I think it is already beneficial to split the big lookup table since it may reduce the conflicts during lookups in the hash table. BTW, this patch introduced a **regression** for a reachability rule in C++20 but it was false-negative. See 'clang/test/CXX/module/module.interface/p7.cpp' for details. This patch is not expected to introduce any other regressions for non-c++20-modules users since the module local lookup table should be empty for them.

The code path has been dead since 2019. See a3eb3d3

The libc headers are C, not C++.

This patch fixes: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:13908:46: error: comparison of integers of different signs: 'uint32_t' (aka 'unsigned int') and 'int' [-Werror,-Wsign-compare]

Only used by Unix/Program.inc and seem always available. Pull Request: llvm#123288

…to FMUL" (llvm#123289) Reverts llvm#87474

When iterating over function records, filtered by file name, currently, the iteration goes over all the function records, repeatedly for each source file, essentially giving quadratic behavior. 413647d sped up some cases by keeping track of the indices of the function records corresponding to each file name. This change expands the use of that map to FunctionRecordIterator. On a test case with Firefox's libxul.so and a 2.5MB profile, this brings down the runtime of `llvm-cov export $lib --instr-profile $prof -t lcov` from 12 minutes with 90% spent in skipOtherFiles to 19 seconds with no samples in skipOtherFiles at all under a sampling profiler (with a sampling interval of 1ms). Fixes llvm#62079

We still have GetDescription and DumpStopContext which serve a similar purpose. (The main reason this is bothering me is because I'm working through the uses of (deprecated) Function::GetAddressRange.)

This is allowed as a GCC extension, see https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html.

Ref.: https://cdrdv2.intel.com/v1/dl/getContent/784266

…vm#122726) …ecord level. This fixes the incorrect diagnostic emitted when compiling the following snippet ``` // string_view.h template<class _CharT> class basic_string_view; typedef basic_string_view<char> string_view; template<class _CharT> class __attribute__((__preferred_name__(string_view))) basic_string_view { public: basic_string_view() { } }; inline basic_string_view<char> foo() { return basic_string_view<char>(); } // A.cppm module; #include "string_view.h" export module A; // Use.cppm module; #include "string_view.h" export module Use; import A; ``` The diagnostic is ``` string_view.h:11:5: error: 'basic_string_view<char>::basic_string_view' from module 'A.<global>' is not present in definition of 'string_view' provided earlier ``` The underlying issue is that deserialization of the `preferred_name` attribute triggers deserialization of `basic_string_view<char>`, which triggers the deserialization of the `preferred_name` attribute again (since it's attached to the `basic_string_view` template). The deserialization logic is implemented in a way that prevents it from going on a loop in a literal sense (it detects early on that it has already seen the `string_view` typedef when trying to start its deserialization for the second time), but leaves the typedef deserialization in an unfinished state. Subsequently, the `string_view` typedef from the deserialized module cannot be merged with the same typedef from `string_view.h`, resulting in the above diagnostic. This PR resolves the problem by delaying the deserialization of the `preferred_name` attribute until the deserialization of the `basic_string_view` template is completed. As a result of deferring, the deserialization of the `preferred_name` attribute doesn't need to go on a loop since the type of the `string_view` typedef is already known when it's deserialized.

When libomp is built with -cf-protection, add endbr instructions to the start of functions for Intel CET support.

This fixes a number of issues introduced in llvm#97130 when LLVM_LIBDIR_SUFFIX is a non-empty string. Make sure that the libdir is always referenced as `lib${LLVM_LIBDIR_SUFFIX}`, not as just `lib` or `${CMAKE_INSTALL_LIBDIR}${LLVM_LIBDIR_SUFFIX}`. This is the standard libdir convention for all LLVM subprojects. Using `${CMAKE_INSTALL_LIBDIR}${LLVM_LIBDIR_SUFFIX}` would result in a duplicate suffix.

…d to targetShrinkDemandedConstant is not 32 or 64 (llvm#123084) See llvm#123029 for details.

[AutoBump] Merge with d0b641b (Jan 14) (40)

[AutoBump] Merge with fixes of be96bd7 (Jan 14) (41) [Only tested MLIR]

[AutoBump] Merge with 31249e2 (Jan 14) (42)

[AutoBump] Merge with fixes of f09db6a (Jan 14) (43) [Only tested MLIR]

[AutoBump] Merge with 3986cff (Jan 15) (44)

[AutoBump] Merge with 1181921 (Jan 17) (48)

[AutoBump] Merge with fixes of 0bd0765 (Jan 17) (49) [Only tested MLIR]

[AutoBump] Merge with e240261 (Jan 17) (52)

[AutoBump] Merge with fixes of 392622d (Dec 09) (22)[Only tested MLIR][New dependency]

[AutoBump] Merge with fixes of d28a4f1 (Jan 17) (51) [Only tested MLIR]

[AutoBump] Merge with fixes of f9a8006 (Jan 15) (47) [Only tested MLIR]

[AutoBump] Merge with fixes of 7402521 (Jan 15) (45) [Reverted]

[AutoBump] Merge with 0195ec4 (Jan 15) (46)

…s set, as it causes options be registered more than once

…535ea8eb18 to LLVM

Use the emitc-provided function to check the types instead of checking for float types, as the other arith lowering do.

Otherwise we get hundreds of warnings during cmake.

Tosa Changes Integration

…1b2c8f10

slackito and others added 30 commits January 16, 2025 16:54

[bazel] Exclude lib/Interpreter/Wasm.h from //clang:interpreter

a5bd01e

We're currently excluding Wasm.cpp, because it requires emscripten. When using header modules, Wasm.h gets compiled on its own and it also requires emscripten, so we need to exclude both.

Revert "[flang] Inline hlfir.dot_product. (llvm#123143)"

afc43a7

This reverts commit 9a6433f. ninja check-flang on x86 host fails to compile.

[LV] Add test case for llvm#119173. NFC

e83e0c3

This showcases a miscompile involving a widened reduction-phi.

[BOLT][AArch64] Speedup computeInstructionSize (llvm#121106)

1fa02b9

AArch64 instructions have a fixed size 4 bytes, no need to compute.

[libc] Fix hdrgen output for no-argument functions (llvm#123245)

906cbbb

The hdrgen output is C, not C++.

[libc] Fix deprecated operator"" syntax (llvm#123259)

421fc04

[libc] Make headers compatible with C++ < 11 (llvm#123260)

a4e87da

C++11 introduced `noexcept`, but `throw()` can be used in older versions of the language.

[AST] Add OriginalDC argument to ExternalASTSource::FindExternalVisib…

263fed7

…leDeclsByName (llvm#123152) Part for relanding llvm#122887. I split this to test where the performance regession comes from if modules are not used.

[CMake] Remove some unneeded HAVE_*_H

f999b11

Pull Request: llvm#123282

[CMake] Remove HAVE_TERMIOS_H

86a81d4

The code path has been dead since 2019. See a3eb3d3

[CMake] Remove HAVE_SYS_IOCTL_H

219beb7

[libc] Fix sigset_t type definition (llvm#123277)

7710453

The libc headers are C, not C++.

[AMDGPU] Fix a warning

bfb6bb6

This patch fixes: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:13908:46: error: comparison of integers of different signs: 'uint32_t' (aka 'unsigned int') and 'int' [-Werror,-Wsign-compare]

[CMake] Remove HAVE_SYS_RESOURCE_H/HAVE_SETRLIMIT/HAVE_GETRLIMIT

414980d

Only used by Unix/Program.inc and seem always available. Pull Request: llvm#123288

Revert "[InstCombine] Transform high latency, dependent FSQRT/FDIV in…

606d0a7

…to FMUL" (llvm#123289) Reverts llvm#87474

[lldb] Remove (unused) SymbolContext::Dump (llvm#123211)

1181921

We still have GetDescription and DumpStopContext which serve a similar purpose. (The main reason this is bothering me is because I'm working through the uses of (deprecated) Function::GetAddressRange.)

EmitC: Allow arrays of size zero (llvm#123292)

0bd0765

This is allowed as a GCC extension, see https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html.

[X86][APX] Support APX + MOVRS (llvm#123264)

1274bca

Ref.: https://cdrdv2.intel.com/v1/dl/getContent/784266

[openmp] Support CET in z_Linux_asm.S (llvm#123213)

90a05f3

When libomp is built with -cf-protection, add endbr instructions to the start of functions for Intel CET support.

[AArch64] Return early rather than asserting when Size of value passe…

c8ba551

…d to targetShrinkDemandedConstant is not 32 or 64 (llvm#123084) See llvm#123029 for details.

jorickert and others added 14 commits April 14, 2025 09:57

Merge pull request #511 from Xilinx/bump_to_d0b641b7

1ee97cd

[AutoBump] Merge with d0b641b (Jan 14) (40)

Merge pull request #512 from Xilinx/bump_to_be96bd74

843962e

[AutoBump] Merge with fixes of be96bd7 (Jan 14) (41) [Only tested MLIR]

Merge pull request #513 from Xilinx/bump_to_31249e27

5073796

[AutoBump] Merge with 31249e2 (Jan 14) (42)

Merge pull request #514 from Xilinx/bump_to_f09db6a3

11597fd

[AutoBump] Merge with fixes of f09db6a (Jan 14) (43) [Only tested MLIR]

Merge pull request #515 from Xilinx/bump_to_3986cffe

6bf8653

[AutoBump] Merge with 3986cff (Jan 15) (44)

Merge pull request #519 from Xilinx/bump_to_11819214

7f52cec

[AutoBump] Merge with 1181921 (Jan 17) (48)

Merge pull request #520 from Xilinx/bump_to_0bd07652

9d07d6b

[AutoBump] Merge with fixes of 0bd0765 (Jan 17) (49) [Only tested MLIR]

Merge pull request #523 from Xilinx/bump_to_e2402615

161b5c6

[AutoBump] Merge with e240261 (Jan 17) (52)

Merge pull request #492 from Xilinx/bump_to_392622d0

c6117c1

[AutoBump] Merge with fixes of 392622d (Dec 09) (22)[Only tested MLIR][New dependency]

Merge pull request #522 from Xilinx/bump_to_d28a4f1f

8ac91d4

[AutoBump] Merge with fixes of d28a4f1 (Jan 17) (51) [Only tested MLIR]

Merge pull request #518 from Xilinx/bump_to_f9a80062

4ce17f1

[AutoBump] Merge with fixes of f9a8006 (Jan 15) (47) [Only tested MLIR]

Merge pull request #516 from Xilinx/bump_to_74025216

ccb5fa0

[AutoBump] Merge with fixes of 7402521 (Jan 15) (45) [Reverted]

Merge branch 'bump_to_1b2c8f10' into bump_to_0195ec45

81e973d

Merge pull request #517 from Xilinx/bump_to_0195ec45

f93946e

[AutoBump] Merge with 0195ec4 (Jan 15) (46)

jorickert changed the title ~~[AutoBump] Merge with fixes of 1b2c8f10 (Nov 26) (16) Needs torch-mlir bump~~ [AutoBump] Merge with fixes of 1b2c8f10 (Nov 26) (16) Tosa Changes Integration Apr 14, 2025

jorickert and others added 12 commits April 14, 2025 02:52

Do not link internal mlir-libs shared, even if MLIR_LINK_MLIR_DYLIB i…

51065a3

…s set, as it causes options be registered more than once

Move getConvOpsAccType from torch-mlir 12250739bfe85b702f9503cad45c2e…

de23f58

…535ea8eb18 to LLVM

feat: allow arrayAttr parsing in constraint

dd65947

test: check some edge cases

4c14f66

chore: add assertion

cba66f4

chore: use llvm::SmallVector

a474457

test: array with contraints results

a008a59

Check valid emitc float/opaque types, not float (#525)

4cfc438

Use the emitc-provided function to check the types instead of checking for float types, as the other arith lowering do.

cmake: Use old CMP0175 policy

61a20b8

Otherwise we get hundreds of warnings during cmake.

Add support for folding tosa.slice with tosa.slice

11193c5

Merge pull request #524 from Xilinx/jrickert.bump_integration

365c2d6

Tosa Changes Integration

Merge remote-tracking branch 'origin/feature/fused-ops' into bump_to_…

04e1b76

…1b2c8f10

jorickert enabled auto-merge April 23, 2025 09:28

jorickert merged commit 4ab068e into feature/fused-ops Apr 23, 2025
50 of 51 checks passed

jorickert deleted the bump_to_1b2c8f10 branch April 23, 2025 11:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AutoBump] Merge with fixes of 1b2c8f10 (Nov 26) (16) Tosa Changes Integration #480

[AutoBump] Merge with fixes of 1b2c8f10 (Nov 26) (16) Tosa Changes Integration #480

Uh oh!

jorickert commented Feb 20, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

[AutoBump] Merge with fixes of 1b2c8f10 (Nov 26) (16) Tosa Changes Integration #480

[AutoBump] Merge with fixes of 1b2c8f10 (Nov 26) (16) Tosa Changes Integration #480

Uh oh!

Conversation

jorickert commented Feb 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jorickert commented Feb 20, 2025 •

edited

Loading