-
Notifications
You must be signed in to change notification settings - Fork 15.2k
Description
This is a copy of an internal page me and @chuongg3 had when going through each of the operations for AArch64 GISel, making sure they don't fall back. Not all of it is complete yet (and the internal version had a few more details), but it is better to have this upstream. Some of it might now be out of date.
A few high level comments
- This does not include SVE, we should probably do the same elsewhere.
- BF16 still needs to be added, but requires a new way to specify the types / operations (and patterns were disabled in [GISel] Explicitly disable BF16 tablegen patterns. #124113).
- BigEndian isn't handled yet.
- Currently some operations widen, some promote. We should stick to one (probably widen).
- Blank spaces usually mean not checked / not supported. We will get to the point where random-testing will start to be more useful.
Edit: There is now https://davemgreen.github.io/gisel.html, which shows what works and whether it is smaller or bigger than SDAG. It is still WIP and doesn't show all operations yet.
Legend:
- Scalar normal = i8/i16/i32/i64
- Vector legal = v8i8/v4i16/v2i32 + v16i8/v8i16/v4i32/v2i64
- Vector larger/smaller = i8/i16/i32/i64 types with non-legal sizes
- i128 = scalar/vector
- i1 = scalar/vector
- Scalar ext = non-power2 sizes, including larger sizes
- Vector odd widths = i8/i16/i32/i64 with non-power-2 widths.
- Vector odd eltsize = non-power2 elt sizes (or i128, etc).
| Operation | Scalar normal | Vector legal | ptr | i128/i1 | Vector larger / smaller | Scalar ext | Vector odd widths | Vector odd eltsizes | Additional Notes |
|---|---|---|---|---|---|---|---|---|---|
| load | y | y | y/y | #116006 | |||||
| store | y | y | y/y | ||||||
| bitcast? ptrtoint? inttoptr? | y | y | |||||||
| getelementptr | y | ||||||||
| phi | y | y | y/y | y | |||||
| select | |||||||||
| memcpy? memmove? memset? bzero? | |||||||||
| Int Operation | Scalar normal | Vector normal | i128 s/v | i1 s/v | Vector larger / smaller | Scalar non-power-2 | Vector odd widths | Vector odd eltsizes | Additional Notes |
| add | y | y | y/y | y | y | x | x | https://godbolt.org/z/6c1rfWTK8 | |
| sub | y | y | y/y | y | y | x | x | ||
| mul | y | y | y/y inefficient | y | Scalar i128 #115512. https://godbolt.org/z/8Wd8zhezc | ||||
| sdiv, udiv | y | y | y/y | y | Scalar i1 could be simpler. https://godbolt.org/z/45qMq6cvh. | ||||
| srem, urem | y | y | y/y | y | Scalar i1: | ||||
| zext, sext, anyext | y | y | ZEXT: Global ISel could be improved to match SDAG by using BIC | ||||||
| trunc | y | y | y | x Non-pow2 larger than 8 | |||||
| and | y | y | y/y | y | https://godbolt.org/z/6Y98TnYv8 | ||||
| or | y | y | y/y | y | |||||
| xor | y | y | y/y | y | |||||
| - not | y | y | y | y | https://godbolt.org/z/rh4ob1be7 | ||||
| shl | y | y | y | y (v2i8) | x | Scalar i8/i16 unnecessarily clear shift amount. i1 could simplify. | |||
| ashr | y | y | y | y(v2i8) | x | Scalar i8/i16 unnecessarily clear shift amount. i1 could simplify. | |||
| lshr | y | y | y | y(v2i8) | x | Scalar i8/i16 unnecessarily clear shift amount. i1 could simplify. | |||
| icmp | y | y | y (i128 could be better) | x | y(v2i8) | i128 could do a lot better. | |||
| select | y | y | y | y (v2i8) | Scalarl: Unnecessary AND to clear upper lanes of the condition register | ||||
| abs | y | y | y | x | y | https://godbolt.org/z/Tobs7YeoT | |||
| smin/smax/umin/umax | y | y | y | y | x > i128 | i1/i128 could do better. https://godbolt.org/z/j7nx789oz. | |||
| uaddsat/usubsat/saddsat/ssubsat | y | y | y/y | y/x | y | i128 could do better. i1 vectors fall back. https://godbolt.org/z/4MT14bfsv | |||
| bitreverse | y | y | y | y | https://godbolt.org/z/3sd988Mhd | ||||
| bswap | y | x | y | x | y | ||||
| ctlz | y | y | y | y | x > i128 | #131514 | |||
| cttz | y | y | y | x | x > i128 | #131513 | |||
| ctpop | y | y | y | x | x | #131513 | |||
| fshr/fshl | y | y | y | x | x NonPow2 > 128 | Scalar Normal: | |||
| - rotr/rotl? | y | y | y | y | y | ||||
| uaddo, usubo, uadde, usube? | |||||||||
| umulo, smulo? | |||||||||
| umulh, smulh | |||||||||
| ushlsat, sshlsat | |||||||||
| smulfix, umulfix | |||||||||
| smulfixsat, umulfixsat | |||||||||
| sdivfix, udivfix | |||||||||
| sdivfixsat, udivfixsat | |||||||||
| FP Operation | Scalar normal | Vector legal | f128 s/v | Vector smaller / larger | bf16 s/v | Vector widths | Additional Notes | ||
| fadd | y | y | y/y | y | https://godbolt.org/z/bYWfo9v16 | ||||
| fsub | y | y | y/y | y | |||||
| fmul | y | y | y/y | y | |||||
| fma | y | y | y/y | y | https://godbolt.org/z/1osE3Whaq | ||||
| fmuladd | y | y | y/y | y | |||||
| fdiv | y | y | y/y | y | |||||
| frem | y | y | y/y | y | |||||
| fneg | y | y | y/y | y | https://godbolt.org/z/rz96eh3PW | ||||
| fpext | y | y | y/y | y | https://godbolt.org/z/358EG4j7r | ||||
| fptrunc | y | y | y/y | y | https://godbolt.org/z/7a7hq6j68 | ||||
| fptosi, fptoui | y | y | y/y | y | |||||
| fptosisat, fptouisat | |||||||||
| sitofp, uitofp | y | y | y/y | y | https://godbolt.org/z/j7Prz7qj6 | ||||
| fabs | y | y | y/y | y | https://godbolt.org/z/o95h4a9es | ||||
| fsqrt | y | y | y/y | y | |||||
| ceil, floor, trunc, rint, nearbyint | y | y | y/y | y | https://godbolt.org/z/zjMqq5oeo | ||||
| lrint, llrint, lround, llround | |||||||||
| fminnum, fmaxnum | y | y | y/y | y | |||||
| fminimum, fmaximum | y | y | y | ||||||
| fminimumnum, fmaximumnum | |||||||||
| fcopysign | y | y | y/y | y | https://godbolt.org/z/aq5bbc4jG | ||||
| fpow | y | y | y/y | y | https://godbolt.org/z/WEeWYj1e4 | ||||
| fpowi | y | y | y/y | y | |||||
| sin, cos, etc | y | y | y/y | y | |||||
| fexp, fexp2, flog, flog2, flog10 | y | y | y/y | y | |||||
| fldexp, frexmp | |||||||||
| fcanonicalize | |||||||||
| is_fpclass | |||||||||
| Vector Operation | Scalar normal | Vector legal | Vector smaller / larger | ptr | Scalar ext | Vector odd widths | Vector odd eltsizes | Additional Notes | |
| insert | - | - | y | y | - | ||||
| extract | - | - | y | y | - | ||||
| shuffle* | - | - | - | ||||||
| - dup | - | - | y | - | |||||
| - ext | - | - | y | y | - | ||||
| - zip1/zip2/uzp2/uzp2/trn1/trn2 | - | - | y | - | |||||
| - tbl | - | - | y | y | - | Could do with tbl2/tbl4 combines | |||
| - reverse | - | - | y | y | - | Needs full reverses from #119083 | |||
| - perfect shuffles | - | - | #106446 | - | |||||
| reduce.add | - | - | y | - | - | ||||
| reduce.mul | - | - | - | ||||||
| reduce.smin/smax/umin/umax | - | - | - | ||||||
| reduce.and/or/xor | - | - | - | ||||||
| reduce.fadd | - | - | y | y | - | - | |||
| reduce.fadd strict | - | - | y | y | - | - | These just scalarize which isn't always the most efficient. | ||
| reduce.fmul | - | - | y | y | - | - | |||
| reduce.fmul strict | - | - | y | y | - | - | |||
| reduce.fmin/fmax/fminimum/fmaxmum | - | - | y | - | x |
What follows is a (very!) incomplete list of random optimizations and other missing features we will need for global isel. If anyone is interested in any then there is lots to do. Jump in and get involved.
-
SVE
-
Big endian
-
BF16
-
Make sure intrinsics all lower successfully.
-
HADD operations and combines [AArch64][GlobalISel] HADD operations and combines #118083
-
ABD operations and combines [AArch64][GlobalISel] ABD operations and combines #118085
-
We currently don't have <1 x s64> types, so in certain cases GPRs are used instead of FPR and reg-bank-select does not compensate.
-
In general RegBankSelect isn't amazing and can get the wrong types for certain phis/selects.
-
Improve divide by constant [AArch64][GlobalISel] Improve divide by constant #118090
-
An implementation of https://reviews.llvm.org/D121088 (https://godbolt.org/z/j1faEbnGj)
-
Support Fixed point converts (llvm/test/CodeGen/AArch64/fcvt-fixed.ll)
-
SLI/SRI combines (e.g. llvm/test/CodeGen/AArch64/arm64-sli-sri-opt.ll)
-
Optimizations for expanded funnel shifts (llvm/test/CodeGen/AArch64/funnel-shift.ll)
-
More efficient bitwise reductions (https://reviews.llvm.org/D148185,
-
Generate cmlt instead of sshr (see https://reviews.llvm.org/D115457, https://godbolt.org/z/89614Edqb. Ideally the patterns should just match.)
-
Pick cmgez over cmgt in this example https://godbolt.org/z/bs6re13oY.
-
Improve i128 min/max to use subc (https://godbolt.org/z/7bG5KfeYn).
-
Bitreverse needs legalization (https://godbolt.org/z/3sd988Mhd).
-
ctpop need legalization (https://godbolt.org/z/75adTxavn).
-
Implement [DAG] Lower frem of power-2 using div/trunc/mul+sub #91148 for GISel.
-
f16 selects are not used.
-
Select to xor from https://reviews.llvm.org/D109149, see select-constant-xor.ll.
-
(select c, (and X, 1), 0) -> (and (zext c), X), from de7881e, see select-to-and-zext.ll.
-
ccmp improvements, for example 43a0016.
-
sitofp(load)