@@ -20315,6 +20315,76 @@ Arguments:
2031520315""""""""""
2031620316The argument to this intrinsic must be a vector of floating-point values.
2031720317
20318+ Vector Partial Reduction Intrinsics
20319+ -----------------------------------
20320+
20321+ Partial reductions of vectors can be expressed using the following intrinsics.
20322+ Each one reduces the concatenation of the two vector arguments down to the
20323+ number of elements of the result vector type.
20324+
20325+ Other than the reduction operator (e.g. add, fadd) the way in which the
20326+ concatenated arguments is reduced is entirely unspecified. By their nature these
20327+ intrinsics are not expected to be useful in isolation but instead implement the
20328+ first phase of an overall reduction operation.
20329+
20330+ The typical use case is loop vectorization where reductions are split into an
20331+ in-loop phase, where maintaining an unordered vector result is important for
20332+ performance, and an out-of-loop phase to calculate the final scalar result.
20333+
20334+ By avoiding the introduction of new ordering constraints, these intrinsics
20335+ enhance the ability to leverage a target's accumulation instructions.
20336+
20337+ '``llvm.vector.partial.reduce.add.*``' Intrinsic
20338+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
20339+
20340+ Syntax:
20341+ """""""
20342+ This is an overloaded intrinsic.
20343+
20344+ ::
20345+
20346+ declare <4 x i32> @llvm.vector.partial.reduce.add.v4i32.v4i32.v8i32(<4 x i32> %a, <8 x i32> %b)
20347+ declare <4 x i32> @llvm.vector.partial.reduce.add.v4i32.v4i32.v16i32(<4 x i32> %a, <16 x i32> %b)
20348+ declare <vscale x 4 x i32> @llvm.vector.partial.reduce.add.nxv4i32.nxv4i32.nxv8i32(<vscale x 4 x i32> %a, <vscale x 8 x i32> %b)
20349+ declare <vscale x 4 x i32> @llvm.vector.partial.reduce.add.nxv4i32.nxv4i32.nxv16i32(<vscale x 4 x i32> %a, <vscale x 16 x i32> %b)
20350+
20351+ Arguments:
20352+ """"""""""
20353+
20354+ The first argument is an integer vector with the same type as the result.
20355+
20356+ The second argument is a vector with a length that is a known integer multiple
20357+ of the result's type, while maintaining the same element type.
20358+
20359+ '``llvm.vector.partial.reduce.fadd.*``' Intrinsic
20360+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
20361+
20362+ Syntax:
20363+ """""""
20364+ This is an overloaded intrinsic.
20365+
20366+ ::
20367+
20368+ declare <4 x f32> @llvm.vector.partial.reduce.fadd.v4f32.v8f32(<4 x f32> %a, <8 x f32> %b)
20369+ declare <vscale x 4 x f32> @llvm.vector.partial.reduce.fadd.nxv4f32.nxv8f32(<vscale x 4 x f32> %a, <vscale x 8 x f32> %b)
20370+
20371+ Arguments:
20372+ """"""""""
20373+
20374+ The first argument is a floating-point vector with the same type as the result.
20375+
20376+ The second argument is a vector with a length that is a known integer multiple
20377+ of the result's type, while maintaining the same element type.
20378+
20379+ Semantics:
20380+ """"""""""
20381+
20382+ As the way in which the arguments to this floating-point intrinsic are reduced
20383+ is unspecified, this intrinsic will assume floating-point reassociation and
20384+ contraction, which may result in variations to the results due to reordering or
20385+ by lowering to different instructions (including combining multiple instructions
20386+ into a single one).
20387+
2031820388'``llvm.vector.insert``' Intrinsic
2031920389^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2032020390
@@ -20688,74 +20758,6 @@ Note that it has the following implications:
2068820758- If ``%cnt`` is non-zero, the return value is non-zero as well.
2068920759- If ``%cnt`` is less than or equal to ``%max_lanes``, the return value is equal to ``%cnt``.
2069020760
20691- Vector Partial Reduction Intrinsics
20692- -----------------------------------
20693-
20694- Partial horizontal reductions of vectors can be expressed using the following intrinsics.
20695- Each one reduces the concatenation of the two vector arguments down to the number of elements
20696- of the result vector type.
20697-
20698- Other than the reduction operator (e.g. add, fadd) the way in which the concatenated
20699- arguments is reduced is entirely unspecified. By their nature these intrinsics
20700- are not expected to be useful in isolation but instead implement the first phase
20701- of an overall reduction operation.
20702-
20703- The typical use case is loop vectorization where reductions are split into an
20704- in-loop phase, where maintaining an unordered vector result is important for
20705- performance, and an out-of-loop phase to calculate the final scalar result.
20706-
20707- By avoiding the introduction of new ordering constraints, these intrinsics
20708- enhance the ability to leverage a target's accumulation instructions.
20709-
20710- '``llvm.vector.partial.reduce.add.*``' Intrinsic
20711- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
20712-
20713- Syntax:
20714- """""""
20715- This is an overloaded intrinsic.
20716-
20717- ::
20718-
20719- declare <4 x i32> @llvm.vector.partial.reduce.add.v4i32.v4i32.v8i32(<4 x i32> %a, <8 x i32> %b)
20720- declare <4 x i32> @llvm.vector.partial.reduce.add.v4i32.v4i32.v16i32(<4 x i32> %a, <16 x i32> %b)
20721- declare <vscale x 4 x i32> @llvm.vector.partial.reduce.add.nxv4i32.nxv4i32.nxv8i32(<vscale x 4 x i32> %a, <vscale x 8 x i32> %b)
20722- declare <vscale x 4 x i32> @llvm.vector.partial.reduce.add.nxv4i32.nxv4i32.nxv16i32(<vscale x 4 x i32> %a, <vscale x 16 x i32> %b)
20723-
20724- Arguments:
20725- """"""""""
20726-
20727- The first argument is an integer vector with the same type as the result.
20728-
20729- The second argument is a vector with a length that is a known integer multiple
20730- of the result's type, while maintaining the same element type.
20731-
20732- '``llvm.vector.partial.reduce.fadd.*``' Intrinsic
20733- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
20734-
20735- Syntax:
20736- """""""
20737- This is an overloaded intrinsic.
20738-
20739- ::
20740-
20741- declare <4 x f32> @llvm.vector.partial.reduce.fadd.v4f32.v8f32(<4 x f32> %a, <8 x f32> %b)
20742- declare <vscale x 4 x f32> @llvm.vector.partial.reduce.fadd.nxv4f32.nxv8f32(<vscale x 4 x f32> %a, <vscale x 8 x f32> %b)
20743-
20744- Arguments:
20745- """"""""""
20746-
20747- The first argument is a floating-point vector with the same type as the result.
20748-
20749- The second argument is a vector with a length that is a known integer multiple
20750- of the result's type, while maintaining the same element type.
20751-
20752- Semantics:
20753- """"""""""
20754-
20755- As the way in which the arguments to this floating-point intrinsic are reduced is unspecified,
20756- this intrinsic will reassociate floating-point values, which may result in variations to the
20757- results due to reordering or by lowering to different instructions.
20758-
2075920761'``llvm.experimental.vector.histogram.*``' Intrinsic
2076020762^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2076120763
0 commit comments