Minor rebase fixes and some cleanup #3

SimplyTheOther · 2020-04-17T11:46:48Z

Changes consist of some rebase issue fixes and some cleanup of code.

The more important fixes are:

actually allows the gccrs compiler driver to run something (fixes the mismatch between the old "grs1" and new "rust1" compiler names) - otherwise the gccrs compiler driver fails to work
readded the location adjustments when parsing (i.e. the "locus - 2" instead of just "locus")
minor changes to satisfy intellisense (and other static analysis) via adding header includes

Also, this is the first test for new PR-based contributing system - we can set a precedent on standards or best practices or whatever.

philberty · 2020-04-17T11:52:49Z

gcc/rust/Make-lang.in


 # Define the names for selecting rust in LANGUAGES.
-rust: gccrs$(exeext) grs1$(exeext)
+rust: gccrs$(exeext) rust1$(exeext)


awesome thats a better name :)

/home/marxin/Programming/gcc2/libsanitizer/ubsan/ubsan_value.cpp:77:25: runtime error: left shift of 0x0000000000000000fffffffffffffffb by 96 places cannot be represented in type '__int128' #0 0x7ffff754edfe in __ubsan::Value::getSIntValue() const /home/marxin/Programming/gcc2/libsanitizer/ubsan/ubsan_value.cpp:77 #1 0x7ffff7548719 in __ubsan::Value::isNegative() const /home/marxin/Programming/gcc2/libsanitizer/ubsan/ubsan_value.h:190 #2 0x7ffff7542a34 in handleShiftOutOfBoundsImpl /home/marxin/Programming/gcc2/libsanitizer/ubsan/ubsan_handlers.cpp:338 #3 0x7ffff75431b7 in __ubsan_handle_shift_out_of_bounds /home/marxin/Programming/gcc2/libsanitizer/ubsan/ubsan_handlers.cpp:370 #4 0x40067f in main (/home/marxin/Programming/testcases/a.out+0x40067f) #5 0x7ffff72c8b24 in __libc_start_main (/lib64/libc.so.6+0x27b24) #6 0x4005bd in _start (/home/marxin/Programming/testcases/a.out+0x4005bd) Differential Revision: https://reviews.llvm.org/D97263 Cherry-pick from 16ede0956cb1f4b692dfa619ccfa6ab1de28e19b.

…imize or target pragmas [PR103012] The following testcases ICE when an optimize or target pragma is followed by a long line (4096+ chars). This is because on such long lines we can't use columns anymore, but the cpp_define calls performed by c_cpp_builtins_optimize_pragma or from the backend hooks for target pragma are done on temporary buffers and expect to get columns from whatever line they appear on (which happens to be the long line after optimize/target pragma), and we run into: #0 fancy_abort (file=0x3abec67 "../../libcpp/line-map.c", line=502, function=0x3abecfc "linemap_add") at ../../gcc/diagnostic.c:1986 #1 0x0000000002e7c335 in linemap_add (set=0x7ffff7fca000, reason=LC_RENAME, sysp=0, to_file=0x41287a0 "pr103012.i", to_line=3) at ../../libcpp/line-map.c:502 #2 0x0000000002e7cc24 in linemap_line_start (set=0x7ffff7fca000, to_line=3, max_column_hint=128) at ../../libcpp/line-map.c:827 #3 0x0000000002e7ce2b in linemap_position_for_column (set=0x7ffff7fca000, to_column=1) at ../../libcpp/line-map.c:898 #4 0x0000000002e771f9 in _cpp_lex_direct (pfile=0x40c3b60) at ../../libcpp/lex.c:3592 #5 0x0000000002e76c3e in _cpp_lex_token (pfile=0x40c3b60) at ../../libcpp/lex.c:3394 #6 0x0000000002e610ef in lex_macro_node (pfile=0x40c3b60, is_def_or_undef=true) at ../../libcpp/directives.c:601 #7 0x0000000002e61226 in do_define (pfile=0x40c3b60) at ../../libcpp/directives.c:639 #8 0x0000000002e610b2 in run_directive (pfile=0x40c3b60, dir_no=0, buf=0x7fffffffd430 "__OPTIMIZE__ 1\n", count=14) at ../../libcpp/directives.c:589 #9 0x0000000002e650c1 in cpp_define (pfile=0x40c3b60, str=0x2f784d1 "__OPTIMIZE__") at ../../libcpp/directives.c:2513 #10 0x0000000002e65100 in cpp_define_unused (pfile=0x40c3b60, str=0x2f784d1 "__OPTIMIZE__") at ../../libcpp/directives.c:2522 #11 0x0000000000f50685 in c_cpp_builtins_optimize_pragma (pfile=0x40c3b60, prev_tree=<optimization_node 0x7fffea042000>, cur_tree=<optimization_node 0x7fffea042020>) at ../../gcc/c-family/c-cppbuiltin.c:600 assertion that LC_RENAME doesn't happen first. I think the right fix is emit those predefined macros upon optimize/target pragmas with BUILTINS_LOCATION, like we already do for those macros at the start of the TU, they don't appear in columns of the next line after it. Another possibility would be to force them at the location of the pragma. 2021-12-30 Jakub Jelinek <[email protected]> PR c++/103012 gcc/ * config/i386/i386-c.c (ix86_pragma_target_parse): Perform cpp_define/cpp_undef calls with forced token locations BUILTINS_LOCATION. * config/arm/arm-c.c (arm_pragma_target_parse): Likewise. * config/aarch64/aarch64-c.c (aarch64_pragma_target_parse): Likewise. * config/s390/s390-c.c (s390_pragma_target_parse): Likewise. gcc/c-family/ * c-cppbuiltin.c (c_cpp_builtins_optimize_pragma): Perform cpp_define_unused/cpp_undef calls with forced token locations BUILTINS_LOCATION. gcc/testsuite/ PR c++/103012 * g++.dg/cpp/pr103012.C: New test. * g++.target/i386/pr103012.C: New test.

This is a regression present on the mainline and 12 branch at -O2, but the issue is related to vectorization so was present at -O3 in earlier versions. The vcondu expander that was added for VIS 3 more than a decade ago does not fully work, because it does not filter out the unsigned condition codes (the instruction is an UNSPEC that accepts only signed condition codes). While I was at it, I also added the missing vcond and vcondu expanders for the new comparison instructions that were added in VIS 4. gcc/ PR target/109140 * config/sparc/sparc.cc (sparc_expand_vcond): Call signed_condition on operand #3 to get the final condition code. Use std::swap. * config/sparc/sparc.md (vcondv8qiv8qi): New VIS 4 expander. (fucmp<gcond:code>8<P:mode>_vis): Move around. (fpcmpu<gcond:code><GCM:gcm_name><P:mode>_vis): Likewise. (vcondu<GCM:mode><GCM:mode>): New VIS 4 expander. gcc/testsuite/ * gcc.target/sparc/20230328-1.c: New test. * gcc.target/sparc/20230328-2.c: Likewise. * gcc.target/sparc/20230328-3.c: Likewise. * gcc.target/sparc/20230328-4.c: Likewise.

Here we instantiate the lambda three times in producing A<0>::f: 1) in tsubst_function_type, substituting the type of A<>::f 2) in tsubst_function_decl, substituting the parameters of A<>::f 3) in regenerate_decl_from_template when instantiating A<>::f The first one gets thrown away by maybe_rebuild_function_decl_type. Before r15-7202, we happily built all of them and mangled the result wrongly as lambda Rust-GCC#3. After r15-7202, we try to mangle Rust-GCC#3 as Rust-GCC#1, which breaks because Rust-GCC#1 is already mangled as Rust-GCC#1. This patch avoids building Rust-GCC#3 by suppressing regenerate_decl_from_template if the template signature includes a lambda, fixing the ICE. We now mangle the lambda as Rust-GCC#2, which is still wrong. Addressing that should involve not calling tsubst_function_type from tsubst_function_decl, and building the type from the parms types in the first place rather than fixing it up in maybe_rebuild_function_decl_type. PR c++/119401 gcc/cp/ChangeLog: * pt.cc (regenerate_decl_from_template): Don't regenerate if the signature involves a lambda. gcc/testsuite/ChangeLog: * g++.dg/cpp2a/lambda-targ11.C: New test.

This patch adds a new param vect-scalar-cost-multiplier to scale the scalar costing during vectorization. If the cost is set high enough and when using the dynamic cost model it has the effect of effectively disabling the costing vs scalar and assumes all vectorization to be profitable. This is similar to using the unlimited cost model, but unlike unlimited it does not fully disable the vector cost model. That means that we still perform comparisons between vector modes. And it means it also still does costing for alias analysis. As an example, the following: void foo (char *restrict a, int *restrict b, int *restrict c, int *restrict d, int stride) { if (stride <= 1) return; for (int i = 0; i < 3; i++) { int res = c[i]; int t = b[i * stride]; if (a[i] != 0) res = t * d[i]; c[i] = res; } } compiled with -O3 -march=armv8-a+sve -fvect-cost-model=dynamic fails to vectorize as it assumes scalar would be faster, and with -fvect-cost-model=unlimited it picks a vector type that's so big that the large sequence generated is working on mostly inactive lanes: ... and p3.b, p3/z, p4.b, p4.b whilelo p0.s, wzr, w7 ld1w z23.s, p3/z, [x3, #3, mul vl] ld1w z28.s, p0/z, [x5, z31.s, sxtw 2] add x0, x5, x0 punpklo p6.h, p6.b ld1w z27.s, p4/z, [x0, z31.s, sxtw 2] and p6.b, p6/z, p0.b, p0.b punpklo p4.h, p7.b ld1w z24.s, p6/z, [x3, #2, mul vl] and p4.b, p4/z, p2.b, p2.b uqdecw w6 ld1w z26.s, p4/z, [x3] whilelo p1.s, wzr, w6 mul z27.s, p5/m, z27.s, z23.s ld1w z29.s, p1/z, [x4, z31.s, sxtw 2] punpkhi p7.h, p7.b mul z24.s, p5/m, z24.s, z28.s and p7.b, p7/z, p1.b, p1.b mul z26.s, p5/m, z26.s, z30.s ld1w z25.s, p7/z, [x3, #1, mul vl] st1w z27.s, p3, [x2, #3, mul vl] mul z25.s, p5/m, z25.s, z29.s st1w z24.s, p6, [x2, #2, mul vl] st1w z25.s, p7, [x2, #1, mul vl] st1w z26.s, p4, [x2] ... With -fvect-cost-model=dynamic --param vect-scalar-cost-multiplier=200 you get more reasonable code: foo: cmp w4, 1 ble .L1 ptrue p7.s, vl3 index z0.s, #0, w4 ld1b z29.s, p7/z, [x0] ld1w z30.s, p7/z, [x1, z0.s, sxtw 2] ptrue p6.b, all cmpne p7.b, p7/z, z29.b, #0 ld1w z31.s, p7/z, [x3] mul z31.s, p6/m, z31.s, z30.s st1w z31.s, p7, [x2] .L1: ret This model has been useful internally for performance exploration and cost-model validation. It allows us to force realistic vectorization overriding the cost model to be able to tell whether it's correct wrt to profitability. gcc/ChangeLog: * params.opt (vect-scalar-cost-multiplier): New. * tree-vect-loop.cc (vect_estimate_min_profitable_iters): Use it. * doc/invoke.texi (vect-scalar-cost-multiplier): Document it. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/cost_model_16.c: New test.

When using SVE INDEX to load an Advanced SIMD vector, we need to take account of the different element ordering for big-endian targets. For example, when big-endian targets store the V4SI constant { 0, 1, 2, 3 } in registers, 0 becomes the most significant element, whereas INDEX always operates from the least significant element. A big-endian target would therefore load V4SI { 0, 1, 2, 3 } using: INDEX Z0.S, #3, #-1 rather than little-endian's: INDEX Z0.S, #0, #1 While there, I noticed that we would only check the first vector in a multi-vector SVE constant, which would trigger an ICE if the other vectors turned out to be invalid. This is pretty difficult to trigger at the moment, since we only allow single-register modes to be used as frontend & middle-end vector modes, but it can be seen using the RTL frontend. gcc/ * config/aarch64/aarch64.cc (aarch64_sve_index_series_p): New function, split out from... (aarch64_simd_valid_imm): ...here. Account for the different SVE and Advanced SIMD element orders on big-endian targets. Check each vector in a structure mode. gcc/testsuite/ * gcc.dg/rtl/aarch64/vec-series-1.c: New test. * gcc.dg/rtl/aarch64/vec-series-2.c: Likewise. * gcc.target/aarch64/sve/acle/general/dupq_2.c: Fix expected output for this big-endian test. * gcc.target/aarch64/sve/acle/general/dupq_4.c: Likewise. * gcc.target/aarch64/sve/vec_init_3.c: Restrict to little-endian targets and add more tests. * gcc.target/aarch64/sve/vec_init_4.c: New big-endian version of vec_init_3.c.

…op is invariant [PR121290] Consider the example: void f (int *restrict x, int *restrict y, int *restrict z, int n) { for (int i = 0; i < 4; ++i) { int res = 0; for (int j = 0; j < 100; ++j) res += y[j] * z[i]; x[i] = res; } } we currently vectorize as f: movi v30.4s, 0 ldr q31, [x2] add x2, x1, 400 .L2: ld1r {v29.4s}, [x1], 4 mla v30.4s, v29.4s, v31.4s cmp x2, x1 bne .L2 str q30, [x0] ret which is not useful because by doing outer-loop vectorization we're performing less work per iteration than we would had we done inner-loop vectorization and simply unrolled the inner loop. This patch teaches the cost model that if all your leafs are invariant, then adjust the loop cost by * VF, since every vector iteration has at least one lane really just doing 1 scalar. There are a couple of ways we could have solved this, one is to increase the unroll factor to process more iterations of the inner loop. This removes the need for the broadcast, however we don't support unrolling the inner loop within the outer loop. We only support unrolling by increasing the VF, which would affect the outer loop as well as the inner loop. We also don't directly support costing inner-loop vs outer-loop vectorization, and as such we're left trying to predict/steer the cost model ahead of time to what we think should be profitable. This patch attempts to do so using a heuristic which penalizes the outer-loop vectorization. We now cost the loop as note: Cost model analysis: Vector inside of loop cost: 2000 Vector prologue cost: 4 Vector epilogue cost: 0 Scalar iteration cost: 300 Scalar outside cost: 0 Vector outside cost: 4 prologue iterations: 0 epilogue iterations: 0 missed: cost model: the vector iteration cost = 2000 divided by the scalar iteration cost = 300 is greater or equal to the vectorization factor = 4. missed: not vectorized: vectorization not profitable. missed: not vectorized: vector version will never be profitable. missed: Loop costings may not be worthwhile. And subsequently generate: .L5: add w4, w4, w7 ld1w z24.s, p6/z, [x0, #1, mul vl] ld1w z23.s, p6/z, [x0, #2, mul vl] ld1w z22.s, p6/z, [x0, #3, mul vl] ld1w z29.s, p6/z, [x0] mla z26.s, p6/m, z24.s, z30.s add x0, x0, x8 mla z27.s, p6/m, z23.s, z30.s mla z28.s, p6/m, z22.s, z30.s mla z25.s, p6/m, z29.s, z30.s cmp w4, w6 bls .L5 and avoids the load and replicate if it knows it has enough vector pipes to do so. gcc/ChangeLog: PR target/121290 * config/aarch64/aarch64.cc (class aarch64_vector_costs ): Add m_loop_fully_scalar_dup. (aarch64_vector_costs::add_stmt_cost): Detect invariant inner loops. (adjust_body_cost): Adjust final costing if m_loop_fully_scalar_dup. gcc/testsuite/ChangeLog: PR target/121290 * gcc.target/aarch64/pr121290.c: New test.

SimplyTheOther added 3 commits April 17, 2020 14:50

Cleanup and fixes to retain desired behaviour

e4f326b

Changed compiler name in Make-lang.in to be consistent with other files

f25f03f

Fixed the compiler name in another place in Make-lang.in

d9125a9

philberty reviewed Apr 17, 2020

View reviewed changes

philberty merged commit 899078f into Rust-GCC:master Apr 17, 2020

philberty mentioned this pull request Jul 26, 2021

Can't call extern functions #421

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Minor rebase fixes and some cleanup #3

Minor rebase fixes and some cleanup #3

Uh oh!

SimplyTheOther commented Apr 17, 2020

Uh oh!

philberty Apr 17, 2020

Uh oh!

Uh oh!

Minor rebase fixes and some cleanup #3

Minor rebase fixes and some cleanup #3

Uh oh!

Conversation

SimplyTheOther commented Apr 17, 2020

Uh oh!

philberty Apr 17, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!