Skip to content

Conversation

SimplyTheOther
Copy link
Member

Changes consist of some rebase issue fixes and some cleanup of code.

The more important fixes are:

  • actually allows the gccrs compiler driver to run something (fixes the mismatch between the old "grs1" and new "rust1" compiler names) - otherwise the gccrs compiler driver fails to work
  • readded the location adjustments when parsing (i.e. the "locus - 2" instead of just "locus")
  • minor changes to satisfy intellisense (and other static analysis) via adding header includes

Also, this is the first test for new PR-based contributing system - we can set a precedent on standards or best practices or whatever.


# Define the names for selecting rust in LANGUAGES.
rust: gccrs$(exeext) grs1$(exeext)
rust: gccrs$(exeext) rust1$(exeext)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome thats a better name :)

@philberty philberty merged commit 899078f into Rust-GCC:master Apr 17, 2020
philberty pushed a commit that referenced this pull request Mar 3, 2021
/home/marxin/Programming/gcc2/libsanitizer/ubsan/ubsan_value.cpp:77:25: runtime error: left shift of 0x0000000000000000fffffffffffffffb by 96 places cannot be represented in type '__int128'
    #0 0x7ffff754edfe in __ubsan::Value::getSIntValue() const /home/marxin/Programming/gcc2/libsanitizer/ubsan/ubsan_value.cpp:77
    #1 0x7ffff7548719 in __ubsan::Value::isNegative() const /home/marxin/Programming/gcc2/libsanitizer/ubsan/ubsan_value.h:190
    #2 0x7ffff7542a34 in handleShiftOutOfBoundsImpl /home/marxin/Programming/gcc2/libsanitizer/ubsan/ubsan_handlers.cpp:338
    #3 0x7ffff75431b7 in __ubsan_handle_shift_out_of_bounds /home/marxin/Programming/gcc2/libsanitizer/ubsan/ubsan_handlers.cpp:370
    #4 0x40067f in main (/home/marxin/Programming/testcases/a.out+0x40067f)
    #5 0x7ffff72c8b24 in __libc_start_main (/lib64/libc.so.6+0x27b24)
    #6 0x4005bd in _start (/home/marxin/Programming/testcases/a.out+0x4005bd)

Differential Revision: https://reviews.llvm.org/D97263

Cherry-pick from 16ede0956cb1f4b692dfa619ccfa6ab1de28e19b.
bors bot pushed a commit that referenced this pull request Jan 25, 2022
…imize or target pragmas [PR103012]

The following testcases ICE when an optimize or target pragma
is followed by a long line (4096+ chars).
This is because on such long lines we can't use columns anymore,
but the cpp_define calls performed by c_cpp_builtins_optimize_pragma
or from the backend hooks for target pragma are done on temporary
buffers and expect to get columns from whatever line they appear on
(which happens to be the long line after optimize/target pragma),
and we run into:
 #0  fancy_abort (file=0x3abec67 "../../libcpp/line-map.c", line=502, function=0x3abecfc "linemap_add") at ../../gcc/diagnostic.c:1986
 #1  0x0000000002e7c335 in linemap_add (set=0x7ffff7fca000, reason=LC_RENAME, sysp=0, to_file=0x41287a0 "pr103012.i", to_line=3) at ../../libcpp/line-map.c:502
 #2  0x0000000002e7cc24 in linemap_line_start (set=0x7ffff7fca000, to_line=3, max_column_hint=128) at ../../libcpp/line-map.c:827
 #3  0x0000000002e7ce2b in linemap_position_for_column (set=0x7ffff7fca000, to_column=1) at ../../libcpp/line-map.c:898
 #4  0x0000000002e771f9 in _cpp_lex_direct (pfile=0x40c3b60) at ../../libcpp/lex.c:3592
 #5  0x0000000002e76c3e in _cpp_lex_token (pfile=0x40c3b60) at ../../libcpp/lex.c:3394
 #6  0x0000000002e610ef in lex_macro_node (pfile=0x40c3b60, is_def_or_undef=true) at ../../libcpp/directives.c:601
 #7  0x0000000002e61226 in do_define (pfile=0x40c3b60) at ../../libcpp/directives.c:639
 #8  0x0000000002e610b2 in run_directive (pfile=0x40c3b60, dir_no=0, buf=0x7fffffffd430 "__OPTIMIZE__ 1\n", count=14) at ../../libcpp/directives.c:589
 #9  0x0000000002e650c1 in cpp_define (pfile=0x40c3b60, str=0x2f784d1 "__OPTIMIZE__") at ../../libcpp/directives.c:2513
 #10 0x0000000002e65100 in cpp_define_unused (pfile=0x40c3b60, str=0x2f784d1 "__OPTIMIZE__") at ../../libcpp/directives.c:2522
 #11 0x0000000000f50685 in c_cpp_builtins_optimize_pragma (pfile=0x40c3b60, prev_tree=<optimization_node 0x7fffea042000>, cur_tree=<optimization_node 0x7fffea042020>)
     at ../../gcc/c-family/c-cppbuiltin.c:600
assertion that LC_RENAME doesn't happen first.

I think the right fix is emit those predefined macros upon
optimize/target pragmas with BUILTINS_LOCATION, like we already do
for those macros at the start of the TU, they don't appear in columns
of the next line after it.  Another possibility would be to force them
at the location of the pragma.

2021-12-30  Jakub Jelinek  <[email protected]>

	PR c++/103012
gcc/
	* config/i386/i386-c.c (ix86_pragma_target_parse): Perform
	cpp_define/cpp_undef calls with forced token locations
	BUILTINS_LOCATION.
	* config/arm/arm-c.c (arm_pragma_target_parse): Likewise.
	* config/aarch64/aarch64-c.c (aarch64_pragma_target_parse): Likewise.
	* config/s390/s390-c.c (s390_pragma_target_parse): Likewise.
gcc/c-family/
	* c-cppbuiltin.c (c_cpp_builtins_optimize_pragma): Perform
	cpp_define_unused/cpp_undef calls with forced token locations
	BUILTINS_LOCATION.
gcc/testsuite/
	PR c++/103012
	* g++.dg/cpp/pr103012.C: New test.
	* g++.target/i386/pr103012.C: New test.
CohenArthur pushed a commit that referenced this pull request Apr 5, 2023
This is a regression present on the mainline and 12 branch at -O2, but the
issue is related to vectorization so was present at -O3 in earlier versions.

The vcondu expander that was added for VIS 3 more than a decade ago does not
fully work, because it does not filter out the unsigned condition codes (the
instruction is an UNSPEC that accepts only signed condition codes).

While I was at it, I also added the missing vcond and vcondu expanders for
the new comparison instructions that were added in VIS 4.

gcc/
	PR target/109140
	* config/sparc/sparc.cc (sparc_expand_vcond): Call signed_condition
	on operand #3 to get the final condition code.  Use std::swap.
	* config/sparc/sparc.md (vcondv8qiv8qi): New VIS 4 expander.
	(fucmp<gcond:code>8<P:mode>_vis): Move around.
	(fpcmpu<gcond:code><GCM:gcm_name><P:mode>_vis): Likewise.
	(vcondu<GCM:mode><GCM:mode>): New VIS 4 expander.

gcc/testsuite/
	* gcc.target/sparc/20230328-1.c: New test.
	* gcc.target/sparc/20230328-2.c: Likewise.
	* gcc.target/sparc/20230328-3.c: Likewise.
	* gcc.target/sparc/20230328-4.c: Likewise.
powerboat9 pushed a commit to powerboat9/gccrs that referenced this pull request Apr 9, 2025
Here we instantiate the lambda three times in producing A<0>::f:
1) in tsubst_function_type, substituting the type of A<>::f
2) in tsubst_function_decl, substituting the parameters of A<>::f
3) in regenerate_decl_from_template when instantiating A<>::f

The first one gets thrown away by maybe_rebuild_function_decl_type.  Before
r15-7202, we happily built all of them and mangled the result wrongly as
lambda Rust-GCC#3.  After r15-7202, we try to mangle Rust-GCC#3 as Rust-GCC#1, which breaks because
 Rust-GCC#1 is already mangled as Rust-GCC#1.

This patch avoids building Rust-GCC#3 by suppressing regenerate_decl_from_template
if the template signature includes a lambda, fixing the ICE.

We now mangle the lambda as Rust-GCC#2, which is still wrong.  Addressing that
should involve not calling tsubst_function_type from tsubst_function_decl,
and building the type from the parms types in the first place rather than
fixing it up in maybe_rebuild_function_decl_type.

	PR c++/119401

gcc/cp/ChangeLog:

	* pt.cc (regenerate_decl_from_template): Don't regenerate if the
	signature involves a lambda.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp2a/lambda-targ11.C: New test.
dkm pushed a commit that referenced this pull request Jun 26, 2025
This patch adds a new param vect-scalar-cost-multiplier to scale the scalar
costing during vectorization.  If the cost is set high enough and when using
the dynamic cost model it has the effect of effectively disabling the
costing vs scalar and assumes all vectorization to be profitable.

This is similar to using the unlimited cost model, but unlike unlimited it
does not fully disable the vector cost model.  That means that we still
perform comparisons between vector modes.  And it means it also still does
costing for alias analysis.

As an example, the following:

void
foo (char *restrict a, int *restrict b, int *restrict c,
     int *restrict d, int stride)
{
    if (stride <= 1)
        return;

    for (int i = 0; i < 3; i++)
        {
            int res = c[i];
            int t = b[i * stride];
            if (a[i] != 0)
                res = t * d[i];
            c[i] = res;
        }
}

compiled with -O3 -march=armv8-a+sve -fvect-cost-model=dynamic fails to
vectorize as it assumes scalar would be faster, and with
-fvect-cost-model=unlimited it picks a vector type that's so big that the large
sequence generated is working on mostly inactive lanes:

        ...
        and     p3.b, p3/z, p4.b, p4.b
        whilelo p0.s, wzr, w7
        ld1w    z23.s, p3/z, [x3, #3, mul vl]
        ld1w    z28.s, p0/z, [x5, z31.s, sxtw 2]
        add     x0, x5, x0
        punpklo p6.h, p6.b
        ld1w    z27.s, p4/z, [x0, z31.s, sxtw 2]
        and     p6.b, p6/z, p0.b, p0.b
        punpklo p4.h, p7.b
        ld1w    z24.s, p6/z, [x3, #2, mul vl]
        and     p4.b, p4/z, p2.b, p2.b
        uqdecw  w6
        ld1w    z26.s, p4/z, [x3]
        whilelo p1.s, wzr, w6
        mul     z27.s, p5/m, z27.s, z23.s
        ld1w    z29.s, p1/z, [x4, z31.s, sxtw 2]
        punpkhi p7.h, p7.b
        mul     z24.s, p5/m, z24.s, z28.s
        and     p7.b, p7/z, p1.b, p1.b
        mul     z26.s, p5/m, z26.s, z30.s
        ld1w    z25.s, p7/z, [x3, #1, mul vl]
        st1w    z27.s, p3, [x2, #3, mul vl]
        mul     z25.s, p5/m, z25.s, z29.s
        st1w    z24.s, p6, [x2, #2, mul vl]
        st1w    z25.s, p7, [x2, #1, mul vl]
        st1w    z26.s, p4, [x2]
        ...

With -fvect-cost-model=dynamic --param vect-scalar-cost-multiplier=200
you get more reasonable code:

foo:
        cmp     w4, 1
        ble     .L1
        ptrue   p7.s, vl3
        index   z0.s, #0, w4
        ld1b    z29.s, p7/z, [x0]
        ld1w    z30.s, p7/z, [x1, z0.s, sxtw 2]
	ptrue   p6.b, all
        cmpne   p7.b, p7/z, z29.b, #0
        ld1w    z31.s, p7/z, [x3]
	mul     z31.s, p6/m, z31.s, z30.s
        st1w    z31.s, p7, [x2]
.L1:
        ret

This model has been useful internally for performance exploration and cost-model
validation.  It allows us to force realistic vectorization overriding the cost
model to be able to tell whether it's correct wrt to profitability.

gcc/ChangeLog:

	* params.opt (vect-scalar-cost-multiplier): New.
	* tree-vect-loop.cc (vect_estimate_min_profitable_iters): Use it.
	* doc/invoke.texi (vect-scalar-cost-multiplier): Document it.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/sve/cost_model_16.c: New test.
dkm pushed a commit that referenced this pull request Jul 10, 2025
When using SVE INDEX to load an Advanced SIMD vector, we need to
take account of the different element ordering for big-endian
targets.  For example, when big-endian targets store the V4SI
constant { 0, 1, 2, 3 } in registers, 0 becomes the most
significant element, whereas INDEX always operates from the
least significant element.  A big-endian target would therefore
load V4SI { 0, 1, 2, 3 } using:

    INDEX Z0.S, #3, #-1

rather than little-endian's:

    INDEX Z0.S, #0, #1

While there, I noticed that we would only check the first vector
in a multi-vector SVE constant, which would trigger an ICE if the
other vectors turned out to be invalid.  This is pretty difficult to
trigger at the moment, since we only allow single-register modes to be
used as frontend & middle-end vector modes, but it can be seen using
the RTL frontend.

gcc/
	* config/aarch64/aarch64.cc (aarch64_sve_index_series_p): New
	function, split out from...
	(aarch64_simd_valid_imm): ...here.  Account for the different
	SVE and Advanced SIMD element orders on big-endian targets.
	Check each vector in a structure mode.

gcc/testsuite/
	* gcc.dg/rtl/aarch64/vec-series-1.c: New test.
	* gcc.dg/rtl/aarch64/vec-series-2.c: Likewise.
	* gcc.target/aarch64/sve/acle/general/dupq_2.c: Fix expected
	output for this big-endian test.
	* gcc.target/aarch64/sve/acle/general/dupq_4.c: Likewise.
	* gcc.target/aarch64/sve/vec_init_3.c: Restrict to little-endian
	targets and add more tests.
	* gcc.target/aarch64/sve/vec_init_4.c: New big-endian version
	of vec_init_3.c.
dkm pushed a commit that referenced this pull request Aug 27, 2025
…op is invariant [PR121290]

Consider the example:

void
f (int *restrict x, int *restrict y, int *restrict z, int n)
{
  for (int i = 0; i < 4; ++i)
    {
      int res = 0;
      for (int j = 0; j < 100; ++j)
        res += y[j] * z[i];
      x[i] = res;
    }
}

we currently vectorize as

f:
        movi    v30.4s, 0
        ldr     q31, [x2]
        add     x2, x1, 400
.L2:
        ld1r    {v29.4s}, [x1], 4
        mla     v30.4s, v29.4s, v31.4s
        cmp     x2, x1
        bne     .L2
        str     q30, [x0]
        ret

which is not useful because by doing outer-loop vectorization we're performing
less work per iteration than we would had we done inner-loop vectorization and
simply unrolled the inner loop.

This patch teaches the cost model that if all your leafs are invariant, then
adjust the loop cost by * VF, since every vector iteration has at least one lane
really just doing 1 scalar.

There are a couple of ways we could have solved this, one is to increase the
unroll factor to process more iterations of the inner loop.  This removes the
need for the broadcast, however we don't support unrolling the inner loop within
the outer loop.  We only support unrolling by increasing the VF, which would
affect the outer loop as well as the inner loop.

We also don't directly support costing inner-loop vs outer-loop vectorization,
and as such we're left trying to predict/steer the cost model ahead of time to
what we think should be profitable.  This patch attempts to do so using a
heuristic which penalizes the outer-loop vectorization.

We now cost the loop as

note:  Cost model analysis:
  Vector inside of loop cost: 2000
  Vector prologue cost: 4
  Vector epilogue cost: 0
  Scalar iteration cost: 300
  Scalar outside cost: 0
  Vector outside cost: 4
  prologue iterations: 0
  epilogue iterations: 0
missed:  cost model: the vector iteration cost = 2000 divided by the scalar iteration cost = 300 is greater or equal to the vectorization factor = 4.
missed:  not vectorized: vectorization not profitable.
missed:  not vectorized: vector version will never be profitable.
missed:  Loop costings may not be worthwhile.

And subsequently generate:

.L5:
        add     w4, w4, w7
        ld1w    z24.s, p6/z, [x0, #1, mul vl]
        ld1w    z23.s, p6/z, [x0, #2, mul vl]
        ld1w    z22.s, p6/z, [x0, #3, mul vl]
        ld1w    z29.s, p6/z, [x0]
        mla     z26.s, p6/m, z24.s, z30.s
        add     x0, x0, x8
        mla     z27.s, p6/m, z23.s, z30.s
        mla     z28.s, p6/m, z22.s, z30.s
        mla     z25.s, p6/m, z29.s, z30.s
        cmp     w4, w6
        bls     .L5

and avoids the load and replicate if it knows it has enough vector pipes to do
so.

gcc/ChangeLog:

	PR target/121290
	* config/aarch64/aarch64.cc
	(class aarch64_vector_costs ): Add m_loop_fully_scalar_dup.
	(aarch64_vector_costs::add_stmt_cost): Detect invariant inner loops.
	(adjust_body_cost): Adjust final costing if m_loop_fully_scalar_dup.

gcc/testsuite/ChangeLog:

	PR target/121290
	* gcc.target/aarch64/pr121290.c: New test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants