Skip to content

[AutoBump] Merge with fixes of 1b2c8f10 (Nov 26) (16) Tosa Changes Integration #480

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4,883 commits into from
Apr 23, 2025

Conversation

jorickert
Copy link

@jorickert jorickert commented Feb 20, 2025

Notable commits:
7402521 caused passes to not be registered, reverted in 528b284

llvm#119408 caused options to be registered more than once, reverted in 51065a3

slackito and others added 30 commits January 16, 2025 16:54
We're currently excluding Wasm.cpp, because it requires emscripten. When
using header modules, Wasm.h gets compiled on its own and it also
requires emscripten, so we need to exclude both.
…lvm#122244)

The motivation for this is to allow us to match strided accesses that
are emitted from the loop vectorizer with EVL tail folding (see llvm#122232)

In these loops the step isn't loop invariant and is based off of
@llvm.experimental.get.vector.length.

We can relax this as long as we make sure to construct the updates after
the definition inside the loop, instead of the preheader.

I presume the restriction was previously added so that the step would
dominate the insertion point in the preheader. I can't think of why it
wouldn't be safe to calculate it in the loop otherwise.
…lvm#122459)

This avoids some of the pending regressions after AMDGPU implements
isExtractVecEltCheap.

In a case like shl <value, undef>, splat k, because the second operand
was fully defined, we would fall through and use the splat value for the
first operand, losing the undef high bits. This would result in an additional
instruction to handle the high bits. Add some reduced testcases for different
opcodes for one of the regressions.
Once again we have excessive TLI hooks with bad defaults. Permit this
for 32-bit element vectors, which are just use-different-register.
We should permit 16-bit vectors as cheap with legal packed instructions,
but I see some mixed improvements and regressions that need investigation.
This reverts commit 9a6433f.  ninja check-flang on x86 host fails to compile.
…vm#122672)

This avoids regressions in a future AMDGPU commit. Previously we
would have a build_vector (extract_vector_elt x), undef with free
access to the elements bloated into a shuffle of one element + undef,
which has much worse combine support than the extract.

Alternatively could check aggressivelyPreferBuildVectorSources, but
I'm not sure it's really different than isExtractVecEltCheap.
This showcases a miscompile involving a widened reduction-phi.
AArch64 instructions have a fixed size 4 bytes, no need to compute.
C++11 introduced `noexcept`, but `throw()` can be used in older
versions of the language.
…leDeclsByName (llvm#123152)

Part for relanding llvm#122887.

I split this to test where the performance regession comes from if
modules are not used.
…lvm#87474)

The proposed patch, in general, tries to transform the below code
sequence:
x = 1.0 / sqrt (a);
r1 = x * x;  // same as 1.0 / a
r2 = a / sqrt(a); // same as sqrt (a)

TO

(If x, r1 and r2 are all used further in the code) 
r1 = 1.0 / a
r2 = sqrt (a)
x = r1 * r2

The transform tries to make high latency sqrt and div operations
independent and also saves on one multiplication.

The patch was tested with SPEC17 suite with cpu=neoverse-v2. The
performance uplift achieved was:
544.nab_r   ~4%

No other regressions were observed. Also, no compile time differences
were observed with the patch.

Closes llvm#54652
…9218)

The intention is to use a "copy" instead of a "sub" to handle the high
parts of 64-bit multiply for this specific case.

This unlocks copy prop use cases where the copy can be reused by later
multiply+add sequences if possible.

Fixes: SWDEV-487672, SWDEV-487669
)

Close llvm#90154

This patch is also an optimization to the lookup process to utilize the
information provided by `export` keyword.

Previously, in the lookup process, the `export` keyword only takes part
in the check part, it doesn't get involved in the lookup process. That
said, previously, in a name lookup for 'name', we would load all of
declarations with the name 'name' and check if these declarations are
valid or not. It works well. But it is inefficient since it may load
declarations that may not be wanted.

Note that this patch actually did a trick in the lookup process instead
of bring module information to DeclarationName or considering module
information when deciding if two declarations are the same. So it may
not be a surprise to me if there are missing cases. But it is not a
regression. It should be already the case. Issue reports are welcomed.

In this patch, I tried to split the big lookup table into a lookup table
as before and a module local lookup table, which takes a combination of
the ID of the DeclContext and hash value of the primary module name as
the key. And refactored `DeclContext::lookup()` method to take the
module information. So that a lookup in a DeclContext won't load
declarations that are local to **other** modules.

And also I think it is already beneficial to split the big lookup table
since it may reduce the conflicts during lookups in the hash table.

BTW, this patch introduced a **regression** for a reachability rule in
C++20 but it was false-negative. See
'clang/test/CXX/module/module.interface/p7.cpp' for details.

This patch is not expected to introduce any other
regressions for non-c++20-modules users since the module local lookup
table should be empty for them.
The code path has been dead since 2019.
See a3eb3d3
The libc headers are C, not C++.
This patch fixes:

  llvm/lib/Target/AMDGPU/SIISelLowering.cpp:13908:46: error:
  comparison of integers of different signs: 'uint32_t' (aka 'unsigned
  int') and 'int' [-Werror,-Wsign-compare]
Only used by Unix/Program.inc and seem always available.

Pull Request: llvm#123288
When iterating over function records, filtered by file name, currently,
the iteration goes over all the function records, repeatedly for each
source file, essentially giving quadratic behavior.

413647d sped up some cases by keeping
track of the indices of the function records corresponding to each file
name. This change expands the use of that map to FunctionRecordIterator.

On a test case with Firefox's libxul.so and a 2.5MB profile, this brings
down the runtime of `llvm-cov export $lib --instr-profile $prof -t lcov`
from 12 minutes with 90% spent in skipOtherFiles to 19 seconds with no
samples in skipOtherFiles at all under a sampling profiler (with a
sampling interval of 1ms).

Fixes llvm#62079
We still have GetDescription and DumpStopContext which serve a similar
purpose.

(The main reason this is bothering me is because I'm working through the
uses of (deprecated) Function::GetAddressRange.)
…vm#122726)

…ecord level.

This fixes the incorrect diagnostic emitted when compiling the following
snippet

```
// string_view.h
template<class _CharT>
class basic_string_view;

typedef basic_string_view<char> string_view;

template<class _CharT>
class
__attribute__((__preferred_name__(string_view)))
basic_string_view {
public:
    basic_string_view() 
    {
    }
};

inline basic_string_view<char> foo()
{
  return basic_string_view<char>();
}
// A.cppm
module;
#include "string_view.h"
export module A;

// Use.cppm
module;
#include "string_view.h"
export module Use;
import A;
```

The diagnostic is 
```
string_view.h:11:5: error: 'basic_string_view<char>::basic_string_view' from module 'A.<global>' is not present in definition of 'string_view' provided earlier
```

The underlying issue is that deserialization of the `preferred_name`
attribute triggers deserialization of `basic_string_view<char>`, which
triggers the deserialization of the `preferred_name` attribute again
(since it's attached to the `basic_string_view` template).
The deserialization logic is implemented in a way that prevents it from
going on a loop in a literal sense (it detects early on that it has
already seen the `string_view` typedef when trying to start its
deserialization for the second time), but leaves the typedef
deserialization in an unfinished state. Subsequently, the `string_view`
typedef from the deserialized module cannot be merged with the same
typedef from `string_view.h`, resulting in the above diagnostic.

This PR resolves the problem by delaying the deserialization of the
`preferred_name` attribute until the deserialization of the
`basic_string_view` template is completed. As a result of deferring, the
deserialization of the `preferred_name` attribute doesn't need to go on
a loop since the type of the `string_view` typedef is already known when
it's deserialized.
When libomp is built with -cf-protection, add endbr instructions to the
start of functions for Intel CET support.
This fixes a number of issues introduced in llvm#97130 when
LLVM_LIBDIR_SUFFIX is a non-empty string. Make sure that the libdir is
always referenced as `lib${LLVM_LIBDIR_SUFFIX}`, not as just `lib` or
`${CMAKE_INSTALL_LIBDIR}${LLVM_LIBDIR_SUFFIX}`.

This is the standard libdir convention for all LLVM subprojects. Using
`${CMAKE_INSTALL_LIBDIR}${LLVM_LIBDIR_SUFFIX}` would result in a
duplicate suffix.
…d to targetShrinkDemandedConstant is not 32 or 64 (llvm#123084)

See llvm#123029 for details.
jorickert and others added 14 commits April 14, 2025 09:57
[AutoBump] Merge with d0b641b (Jan 14) (40)
[AutoBump] Merge with fixes of be96bd7 (Jan 14) (41) [Only tested MLIR]
[AutoBump] Merge with 31249e2 (Jan 14) (42)
[AutoBump] Merge with fixes of f09db6a (Jan 14) (43) [Only tested MLIR]
[AutoBump] Merge with 3986cff (Jan 15) (44)
[AutoBump] Merge with 1181921 (Jan 17) (48)
[AutoBump] Merge with fixes of 0bd0765 (Jan 17) (49) [Only tested MLIR]
[AutoBump] Merge with e240261 (Jan 17) (52)
[AutoBump] Merge with fixes of 392622d (Dec 09) (22)[Only tested MLIR][New dependency]
[AutoBump] Merge with fixes of d28a4f1 (Jan 17) (51) [Only tested MLIR]
[AutoBump] Merge with fixes of f9a8006 (Jan 15) (47) [Only tested MLIR]
[AutoBump] Merge with fixes of 7402521 (Jan 15) (45) [Reverted]
[AutoBump] Merge with 0195ec4 (Jan 15) (46)
@jorickert jorickert changed the title [AutoBump] Merge with fixes of 1b2c8f10 (Nov 26) (16) Needs torch-mlir bump [AutoBump] Merge with fixes of 1b2c8f10 (Nov 26) (16) Tosa Changes Integration Apr 14, 2025
@jorickert jorickert enabled auto-merge April 23, 2025 09:28
@jorickert jorickert merged commit 4ab068e into feature/fused-ops Apr 23, 2025
50 of 51 checks passed
@jorickert jorickert deleted the bump_to_1b2c8f10 branch April 23, 2025 11:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.