-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[mlir][Vector] Add utility for computing scalable value bounds #83876
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@llvm/pr-subscribers-mlir-vector @llvm/pr-subscribers-mlir Author: Benjamin Maxwell (MacDue) ChangesThis adds a new API built with the The result is an The API is defined as follows: FailureOr<ConstantOrScalableBound>
vector::computeScalableBound(Value value, std::optional<int64_t> dim,
unsigned vscaleMin, unsigned vscaleMax,
presburger::BoundType boundType);Note: We believe this API could prove useful downstream in IREE (which uses a similar analysis to hoist allocas, which currently fails for scalable vectors). Patch is 23.68 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/83876.diff 6 Files Affected:
diff --git a/mlir/include/mlir/Dialect/Vector/Utils/VectorUtils.h b/mlir/include/mlir/Dialect/Vector/Utils/VectorUtils.h
index f6b03a0f2c8007..635d609d0b3e71 100644
--- a/mlir/include/mlir/Dialect/Vector/Utils/VectorUtils.h
+++ b/mlir/include/mlir/Dialect/Vector/Utils/VectorUtils.h
@@ -9,6 +9,7 @@
#ifndef MLIR_DIALECT_VECTOR_UTILS_VECTORUTILS_H_
#define MLIR_DIALECT_VECTOR_UTILS_VECTORUTILS_H_
+#include "mlir/Analysis/Presburger/IntegerRelation.h"
#include "mlir/Dialect/Utils/IndexingUtils.h"
#include "mlir/Dialect/Vector/IR/VectorOps.h"
#include "mlir/IR/BuiltinAttributes.h"
@@ -98,6 +99,34 @@ bool isContiguousSlice(MemRefType memrefType, VectorType vectorType);
std::optional<StaticTileOffsetRange>
createUnrollIterator(VectorType vType, int64_t targetRank = 1);
+struct ConstantOrScalableBound {
+ AffineMap map;
+
+ struct BoundSize {
+ int64_t baseSize{0};
+ bool scalable{false};
+ };
+
+ /// Get the (possibly) scalable size of the bound, returns failure if the
+ /// bound cannot be represented as a single quantity.
+ FailureOr<BoundSize> getSize() const;
+};
+
+/// Computes a (possibly) scalable bound for a given value. This is similar to
+/// `ValueBoundsConstraintSet::computeConstantBound()`, but uses knowledge of
+/// the range of vscale to compute either a constant bound, an expression in
+/// terms of vscale, or failure if no bound can be computed.
+///
+/// The resulting `AffineMap` will always take at most one parameter, vscale,
+/// and return a single result, which is the bound of `value`.
+///
+/// Note: `vscaleMin` must be `<=` to `vscaleMax`. If `vscaleMin` ==
+/// `vscaleMax`, the resulting bound (if found), will be constant.
+FailureOr<ConstantOrScalableBound>
+computeScalableBound(Value value, std::optional<int64_t> dim,
+ unsigned vscaleMin, unsigned vscaleMax,
+ presburger::BoundType boundType);
+
} // namespace vector
/// Constructs a permutation map of invariant memref indices to vector
diff --git a/mlir/include/mlir/Interfaces/ValueBoundsOpInterface.h b/mlir/include/mlir/Interfaces/ValueBoundsOpInterface.h
index 28dadfb9ecf868..6d0e16bf215f8a 100644
--- a/mlir/include/mlir/Interfaces/ValueBoundsOpInterface.h
+++ b/mlir/include/mlir/Interfaces/ValueBoundsOpInterface.h
@@ -265,10 +265,28 @@ class ValueBoundsConstraintSet {
ValueBoundsConstraintSet(MLIRContext *ctx);
+ /// A callback to allow injecting custom value bounds constraints.
+ /// It takes the current value, the dim (or kIndexValue), and a reference to
+ /// the constraints set.
+ using PopulateCustomValueBoundsFn =
+ function_ref<void(Value, int64_t, ValueBoundsConstraintSet &)>;
+
+ /// Populates the constraint set for a value/map without actually computing
+ /// the bound.
+ int64_t populateConstraintsSet(
+ Value value, std::optional<int64_t> dim = std::nullopt,
+ PopulateCustomValueBoundsFn customValueBounds = nullptr,
+ StopConditionFn stopCondition = nullptr);
+ int64_t populateConstraintsSet(
+ AffineMap map, ValueDimList mapOperands,
+ PopulateCustomValueBoundsFn customValueBounds = nullptr,
+ StopConditionFn stopCondition = nullptr, int64_t *posOut = nullptr);
+
/// Iteratively process all elements on the worklist until an index-typed
/// value or shaped value meets `stopCondition`. Such values are not processed
/// any further.
- void processWorklist(StopConditionFn stopCondition);
+ void processWorklist(StopConditionFn stopCondition,
+ PopulateCustomValueBoundsFn customValueBounds = nullptr);
/// Bound the given column in the underlying constraint set by the given
/// expression.
diff --git a/mlir/lib/Dialect/Vector/Utils/VectorUtils.cpp b/mlir/lib/Dialect/Vector/Utils/VectorUtils.cpp
index d613672608c3ad..77b2ad6ff540ce 100644
--- a/mlir/lib/Dialect/Vector/Utils/VectorUtils.cpp
+++ b/mlir/lib/Dialect/Vector/Utils/VectorUtils.cpp
@@ -24,6 +24,7 @@
#include "mlir/IR/IntegerSet.h"
#include "mlir/IR/Operation.h"
#include "mlir/IR/TypeUtilities.h"
+#include "mlir/Interfaces/ValueBoundsOpInterface.h"
#include "mlir/Support/LLVM.h"
#include "mlir/Support/MathExtras.h"
@@ -300,3 +301,132 @@ vector::createUnrollIterator(VectorType vType, int64_t targetRank) {
shapeToUnroll = shapeToUnroll.slice(0, firstScalableDim);
return StaticTileOffsetRange(shapeToUnroll, /*unrollStep=*/1);
}
+
+FailureOr<vector::ConstantOrScalableBound::BoundSize>
+vector::ConstantOrScalableBound::getSize() const {
+ if (map.isSingleConstant())
+ return BoundSize{map.getSingleConstantResult(), /*scalable=*/false};
+ if (map.getNumResults() != 1 || map.getNumInputs() != 1)
+ return failure();
+ auto binop = dyn_cast<AffineBinaryOpExpr>(map.getResult(0));
+ if (!binop || binop.getKind() != AffineExprKind::Mul)
+ return failure();
+ auto matchConstant = [&](AffineExpr expr, int64_t &constant) -> bool {
+ if (auto cst = dyn_cast<AffineConstantExpr>(expr)) {
+ constant = cst.getValue();
+ return true;
+ }
+ return false;
+ };
+ // Match `s0 * cst` or `cst * s0`:
+ int64_t cst = 0;
+ auto lhs = binop.getLHS();
+ auto rhs = binop.getRHS();
+ if ((matchConstant(lhs, cst) && isa<AffineSymbolExpr>(rhs)) ||
+ (matchConstant(rhs, cst) && isa<AffineSymbolExpr>(lhs))) {
+ return BoundSize{cst, /*scalable=*/true};
+ }
+ return failure();
+}
+
+namespace {
+struct ScalableValueBoundsConstraintSet : public ValueBoundsConstraintSet {
+ using ValueBoundsConstraintSet::ValueBoundsConstraintSet;
+
+ static Operation *getOwnerOfValue(Value value) {
+ if (auto bbArg = dyn_cast<BlockArgument>(value))
+ return bbArg.getOwner()->getParentOp();
+ return value.getDefiningOp();
+ }
+
+ static FailureOr<AffineMap>
+ computeScalableBound(Value value, std::optional<int64_t> dim,
+ unsigned vscaleMin, unsigned vscaleMax,
+ presburger::BoundType boundType) {
+ using namespace presburger;
+
+ assert(vscaleMin <= vscaleMax);
+ ScalableValueBoundsConstraintSet cstr(value.getContext());
+
+ Value vscale;
+ int64_t pos = cstr.populateConstraintsSet(
+ value, dim,
+ /* Custom vscale value bounds */
+ [&vscale, vscaleMin, vscaleMax](Value value, int64_t dim,
+ ValueBoundsConstraintSet &cstr) {
+ if (dim != ValueBoundsConstraintSet::kIndexValue)
+ return;
+ if (isa_and_present<vector::VectorScaleOp>(getOwnerOfValue(value))) {
+ if (vscale) {
+ // All copies of vscale are equivalent.
+ cstr.bound(value) == cstr.getExpr(vscale);
+ } else {
+ // We know vscale is confined to [vscaleMin, vscaleMax].
+ cstr.bound(value) >= vscaleMin;
+ cstr.bound(value) <= vscaleMax;
+ vscale = value;
+ }
+ }
+ },
+ /* Stop condition */
+ [](auto, auto) {
+ // Keep adding constraints till the worklist is empty.
+ return false;
+ });
+
+ // Project out all variables apart from the first vscale.
+ cstr.projectOut([&](ValueDim p) { return p.first != vscale; });
+
+ assert(cstr.cstr.getNumDimAndSymbolVars() ==
+ cstr.positionToValueDim.size() &&
+ "inconsistent mapping state");
+
+ for (int64_t i = 0; i < cstr.cstr.getNumDimAndSymbolVars(); ++i) {
+ if (i == pos)
+ continue;
+ if (cstr.positionToValueDim[i] !=
+ ValueDim(vscale, ValueBoundsConstraintSet::kIndexValue)) {
+ return failure();
+ }
+ }
+
+ SmallVector<AffineMap, 1> lowerBound(1), upperBound(1);
+ cstr.cstr.getSliceBounds(pos, 1, value.getContext(), &lowerBound,
+ &upperBound,
+ /*closedUB=*/true);
+
+ auto invalidBound = [](auto &bound) {
+ return !bound[0] || bound[0].getNumResults() != 1;
+ };
+
+ AffineMap bound = [&] {
+ if (boundType == BoundType::EQ && !invalidBound(lowerBound) &&
+ lowerBound[0] == lowerBound[0]) {
+ return lowerBound[0];
+ } else if (boundType == BoundType::LB && !invalidBound(lowerBound)) {
+ return lowerBound[0];
+ } else if (boundType == BoundType::UB && !invalidBound(upperBound)) {
+ return upperBound[0];
+ }
+ return AffineMap{};
+ }();
+
+ if (!bound)
+ return failure();
+
+ return bound;
+ }
+};
+
+} // namespace
+
+FailureOr<vector::ConstantOrScalableBound>
+vector::computeScalableBound(Value value, std::optional<int64_t> dim,
+ unsigned vscaleMin, unsigned vscaleMax,
+ presburger::BoundType boundType) {
+ auto bound = ScalableValueBoundsConstraintSet::computeScalableBound(
+ value, dim, vscaleMin, vscaleMax, boundType);
+ if (failed(bound))
+ return failure();
+ return ConstantOrScalableBound{*bound};
+}
diff --git a/mlir/lib/Interfaces/ValueBoundsOpInterface.cpp b/mlir/lib/Interfaces/ValueBoundsOpInterface.cpp
index 85abc2df894797..ac4e3b935a0542 100644
--- a/mlir/lib/Interfaces/ValueBoundsOpInterface.cpp
+++ b/mlir/lib/Interfaces/ValueBoundsOpInterface.cpp
@@ -191,7 +191,9 @@ static Operation *getOwnerOfValue(Value value) {
return value.getDefiningOp();
}
-void ValueBoundsConstraintSet::processWorklist(StopConditionFn stopCondition) {
+void ValueBoundsConstraintSet::processWorklist(
+ StopConditionFn stopCondition,
+ PopulateCustomValueBoundsFn customValueBounds) {
while (!worklist.empty()) {
int64_t pos = worklist.front();
worklist.pop();
@@ -215,8 +217,11 @@ void ValueBoundsConstraintSet::processWorklist(StopConditionFn stopCondition) {
if (stopCondition(value, maybeDim))
continue;
- // Query `ValueBoundsOpInterface` for constraints. New items may be added to
- // the worklist.
+ // 1. Query `customValueBounds` for constraints (if provided).
+ if (customValueBounds)
+ customValueBounds(value, dim, *this);
+
+ // 2. Query `ValueBoundsOpInterface` for constraints.
auto valueBoundsOp =
dyn_cast<ValueBoundsOpInterface>(getOwnerOfValue(value));
if (valueBoundsOp) {
@@ -228,6 +233,8 @@ void ValueBoundsConstraintSet::processWorklist(StopConditionFn stopCondition) {
continue;
}
+ // Steps 1 and 2 above may add new items to the worklist.
+
// If the op does not implement `ValueBoundsOpInterface`, check if it
// implements the `DestinationStyleOpInterface`. OpResults of such ops are
// tied to OpOperands. Tied values have the same shape.
@@ -471,55 +478,92 @@ FailureOr<int64_t> ValueBoundsConstraintSet::computeConstantBound(
closedUB);
}
+FailureOr<int64_t> ValueBoundsConstraintSet::computeConstantBound(
+ presburger::BoundType type, AffineMap map, ArrayRef<Value> operands,
+ StopConditionFn stopCondition, bool closedUB) {
+ ValueDimList valueDims;
+ for (Value v : operands) {
+ assert(v.getType().isIndex() && "expected index type");
+ valueDims.emplace_back(v, std::nullopt);
+ }
+ return computeConstantBound(type, map, valueDims, stopCondition, closedUB);
+}
+
FailureOr<int64_t> ValueBoundsConstraintSet::computeConstantBound(
presburger::BoundType type, AffineMap map, ValueDimList operands,
StopConditionFn stopCondition, bool closedUB) {
assert(map.getNumResults() == 1 && "expected affine map with one result");
ValueBoundsConstraintSet cstr(map.getContext());
- int64_t pos = cstr.insert(/*isSymbol=*/false);
+
+ int64_t pos = 0;
+ if (stopCondition) {
+ cstr.populateConstraintsSet(map, operands, nullptr, stopCondition, &pos);
+ } else {
+ // No stop condition specified: Keep adding constraints until a bound could
+ // be computed.
+ cstr.populateConstraintsSet(
+ map, operands, nullptr,
+ [&](Value v, std::optional<int64_t> dim) {
+ return cstr.cstr.getConstantBound64(type, pos).has_value();
+ },
+ &pos);
+ }
+ // Compute constant bound for `valueDim`.
+ int64_t ubAdjustment = closedUB ? 0 : 1;
+ if (auto bound = cstr.cstr.getConstantBound64(type, pos))
+ return type == BoundType::UB ? *bound + ubAdjustment : *bound;
+ return failure();
+}
+
+int64_t ValueBoundsConstraintSet::populateConstraintsSet(
+ Value value, std::optional<int64_t> dim,
+ PopulateCustomValueBoundsFn customValueBounds,
+ StopConditionFn stopCondition) {
+#ifndef NDEBUG
+ assertValidValueDim(value, dim);
+#endif // NDEBUG
+
+ AffineMap map =
+ AffineMap::get(/*dimCount=*/1, /*symbolCount=*/0,
+ Builder(value.getContext()).getAffineDimExpr(0));
+ return populateConstraintsSet(map, {{value, dim}}, customValueBounds,
+ stopCondition);
+}
+
+int64_t ValueBoundsConstraintSet::populateConstraintsSet(
+ AffineMap map, ValueDimList operands,
+ PopulateCustomValueBoundsFn customValueBounds,
+ StopConditionFn stopCondition, int64_t *posOut) {
+ assert(map.getNumResults() == 1 && "expected affine map with one result");
+ int64_t pos = insert(/*isSymbol=*/false);
+ if (posOut)
+ *posOut = pos;
// Add map and operands to the constraint set. Dimensions are converted to
// symbols. All operands are added to the worklist.
auto mapper = [&](std::pair<Value, std::optional<int64_t>> v) {
- return cstr.getExpr(v.first, v.second);
+ return getExpr(v.first, v.second);
};
SmallVector<AffineExpr> dimReplacements = llvm::to_vector(
llvm::map_range(ArrayRef(operands).take_front(map.getNumDims()), mapper));
SmallVector<AffineExpr> symReplacements = llvm::to_vector(
llvm::map_range(ArrayRef(operands).drop_front(map.getNumDims()), mapper));
- cstr.addBound(
+ addBound(
presburger::BoundType::EQ, pos,
map.getResult(0).replaceDimsAndSymbols(dimReplacements, symReplacements));
// Process the backward slice of `operands` (i.e., reverse use-def chain)
// until `stopCondition` is met.
if (stopCondition) {
- cstr.processWorklist(stopCondition);
+ processWorklist(stopCondition, customValueBounds);
} else {
- // No stop condition specified: Keep adding constraints until a bound could
- // be computed.
- cstr.processWorklist(
- /*stopCondition=*/[&](Value v, std::optional<int64_t> dim) {
- return cstr.cstr.getConstantBound64(type, pos).has_value();
- });
+ // No stop condition specified: Keep adding constraints until the worklist
+ // is empty.
+ processWorklist([](Value v, std::optional<int64_t> dim) { return false; },
+ customValueBounds);
}
- // Compute constant bound for `valueDim`.
- int64_t ubAdjustment = closedUB ? 0 : 1;
- if (auto bound = cstr.cstr.getConstantBound64(type, pos))
- return type == BoundType::UB ? *bound + ubAdjustment : *bound;
- return failure();
-}
-
-FailureOr<int64_t> ValueBoundsConstraintSet::computeConstantBound(
- presburger::BoundType type, AffineMap map, ArrayRef<Value> operands,
- StopConditionFn stopCondition, bool closedUB) {
- ValueDimList valueDims;
- for (Value v : operands) {
- assert(v.getType().isIndex() && "expected index type");
- valueDims.emplace_back(v, std::nullopt);
- }
- return computeConstantBound(type, map, valueDims, stopCondition, closedUB);
+ return pos;
}
FailureOr<int64_t>
diff --git a/mlir/test/Dialect/Vector/test-scalable-upper-bound.mlir b/mlir/test/Dialect/Vector/test-scalable-upper-bound.mlir
new file mode 100644
index 00000000000000..2afc4db874b73e
--- /dev/null
+++ b/mlir/test/Dialect/Vector/test-scalable-upper-bound.mlir
@@ -0,0 +1,137 @@
+// RUN: mlir-opt %s -test-affine-reify-value-bounds -cse -verify-diagnostics \
+// RUN: -verify-diagnostics -split-input-file | FileCheck %s
+
+#fixedDim0Map = affine_map<(d0)[s0] -> (-d0 + 32400, s0)>
+#fixedDim1Map = affine_map<(d0)[s0] -> (-d0 + 16, s0)>
+
+// Here the upper bound for min_i is 4 x vscale, as we know 4 x vscale is
+// always less than 32400. The bound for min_j is 16 as at vscale > 4,
+// 4 x vscale will be > 16, so the value will be clamped at 16.
+
+// CHECK: #[[$SCALABLE_BOUND_MAP_0:.*]] = affine_map<()[s0] -> (s0 * 4)>
+
+// CHECK-LABEL: @fixed_size_loop_nest
+// CHECK-DAG: %[[SCALABLE_BOUND:.*]] = affine.apply #[[$SCALABLE_BOUND_MAP_0]]()[%vscale]
+// CHECK-DAG: %[[C16:.*]] = arith.constant 16 : index
+// CHECK: "test.some_use"(%[[SCALABLE_BOUND]], %[[C16]]) : (index, index) -> ()
+func.func @fixed_size_loop_nest() {
+ %c16 = arith.constant 16 : index
+ %c32400 = arith.constant 32400 : index
+ %c4 = arith.constant 4 : index
+ %c0 = arith.constant 0 : index
+ %vscale = vector.vscale
+ %c4_vscale = arith.muli %vscale, %c4 : index
+ scf.for %i = %c0 to %c32400 step %c4_vscale {
+ %min_i = affine.min #fixedDim0Map(%i)[%c4_vscale]
+ scf.for %j = %c0 to %c16 step %c4_vscale {
+ %min_j = affine.min #fixedDim1Map(%j)[%c4_vscale]
+ %bound_i = "test.reify_scalable_bound"(%min_i) {type = "UB"} : (index) -> index
+ %bound_j = "test.reify_scalable_bound"(%min_j) {type = "UB"} : (index) -> index
+ "test.some_use"(%bound_i, %bound_j) : (index, index) -> ()
+ }
+ }
+ return
+}
+
+// -----
+
+#dynamicDim0Map = affine_map<(d0, d1)[s0] -> (-d0 + d1, s0)>
+#dynamicDim1Map = affine_map<(d0, d1)[s0] -> (-d0 + d1, s0)>
+
+// Here upper bounds for both min_i and min_j are both 4 x vscale, as we know
+// that is always the largest value they could take. As if `dim < 4 x vscale`
+// then 4 x vscale is an overestimate, and if `dim > 4 x vscale` then the min
+// will be clamped to 4 x vscale.
+
+// CHECK: #[[$SCALABLE_BOUND_MAP_1:.*]] = affine_map<()[s0] -> (s0 * 4)>
+
+// CHECK-LABEL: @dynamic_size_loop_nest
+// CHECK: %[[SCALABLE_BOUND:.*]] = affine.apply #[[$SCALABLE_BOUND_MAP_1]]()[%vscale]
+// CHECK: "test.some_use"(%[[SCALABLE_BOUND]], %[[SCALABLE_BOUND]]) : (index, index) -> ()
+func.func @dynamic_size_loop_nest(%dim0: index, %dim1: index) {
+ %c4 = arith.constant 4 : index
+ %c0 = arith.constant 0 : index
+ %vscale = vector.vscale
+ %c4_vscale = arith.muli %vscale, %c4 : index
+ scf.for %i = %c0 to %dim0 step %c4_vscale {
+ %min_i = affine.min #dynamicDim0Map(%i)[%c4_vscale, %dim0]
+ scf.for %j = %c0 to %dim1 step %c4_vscale {
+ %min_j = affine.min #dynamicDim1Map(%j)[%c4_vscale, %dim1]
+ %bound_i = "test.reify_scalable_bound"(%min_i) {type = "UB"} : (index) -> index
+ %bound_j = "test.reify_scalable_bound"(%min_j) {type = "UB"} : (index) -> index
+ "test.some_use"(%bound_i, %bound_j) : (index, index) -> ()
+ }
+ }
+ return
+}
+
+// -----
+
+// Here the upper bound is just a value + a constant.
+
+// CHECK: #[[$SCALABLE_BOUND_MAP_2:.*]] = affine_map<()[s0] -> (s0 + 8)>
+
+// CHECK-LABEL: @add_to_vscale
+// CHECK: %[[SCALABLE_BOUND:.*]] = affine.apply #[[$SCALABLE_BOUND_MAP_2]]()[%vscale]
+// CHECK: "test.some_use"(%[[SCALABLE_BOUND]]) : (index) -> ()
+func.func @add_to_vscale() {
+ %vscale = vector.vscale
+ %c8 = arith.constant 8 : index
+ %vscale_plus_c8 = arith.addi %vscale, %c8 : index
+ %bound = "test.reify_scalable_bound"(%vscale_plus_c8) {type = "UB"} : (index) -> index
+ "test.some_use"(%bound) : (index) -> ()
+ return
+}
+
+// -----
+
+// Here we know vscale is always 2 so we get a constant upper bound.
+
+// CHECK-LABEL: @vscale_fixed_size
+// CHECK: %[[C2:.*]] = arith.constant 2 : index
+// CHECK: "test.some_use"(%[[C2]]) : (index) -> ()
+func.func @vscale_fixed_size() {
+ %vscale = vector.vscale
+ %bound = "test.reify_scalable_bound"(%vscale) {type = "UB", vscale_min = 2, vscale_max = 2} : (index) -> index
+ "test.some_use"(%bound) : (index) -> ()
+ return
+}
+
+// -----
+
+// Here we don't know the upper bound (%a is underspecified)
+
+func.func @unknown_bound(%a: index) {
+ %vscale = vector.vscale
+ %vscale_plus_a = arith.muli %vscale, %a : index
+ // expected-error @below{{could not reify bound}}
+ %bound = "test.reify_scalable_bound"(%vscale_plus_a) {type = "UB"} : (index) -> index
+ "test.some_use"(%bound) : (index) -> ()
+ return
+}
+
+// -----
+
+// Here we have two vscale values (that have not been CSE'd), but they should
+// still be treated as equivalent.
+
+// CHECK: #[[$SCALABLE_BOUND_MAP_3:.*]] = affine_map<()[s0] -> (s0 * 6)>
+
+// CHECK-LABEL: @duplicate_vscale_values
+// CHECK: %[...
[truncated]
|
|
@matthias-springer I think you implemented ValueBoundsConstraintSet in the past, could you help review ? |
banach-space
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, left a few minor comments, thanks!
banach-space
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few more comments from me. I'm not familiar with this area, hence these are mostly requests for more clarification. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to see if we can reuse more functionality from ValueBoundsConstraintSet, so less has to be reimplemented here.
One thing that stands out to me in this PR is that vector.vscale does not implement ValueBoundsOpInterface.
Here's an idea that could work:
ScalableValueBoundsConstraintSetdefines 3 fields:vscaleMin,vscaleMaxandvscale.vscaleMinandvscaleMaxare initialized in the constructor ofScalableValueBoundsConstraintSet.vscaleis first seenvscaleSSA value. "empty" value in the beginning.- The
ValueBoundsOpInterface::populateBoundsForIndexValueimplementation dyn_castscstrtoScalableValueBoundsConstraintSet &. If that succeeds, we can getvscaleMin,vscaleMaxandvscaleand implement the same logic that you have here without thePopulateCustomValueBoundsFncallback. - At that point, we may just be able to call
ValueBoundsConstraintSet::computeDependentBound, which gives you a bound in which only certain SSA values (vscale) are allowed to appear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think ValueBoundsConstraintSet::computeDependentBound() does exactly what I want here. First, I'm not sure I want to stop at the first vscale found, also computeDependentBound() does not project out the valueDim, which in my experiments gave affine_maps that take two input symbols (presumably one for vscale and the valueDim), which did not usefully describe the bound.
Only once I eliminated all variables except vscale did I get the bounds I was interested in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I think the other points above could work 👍)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked into the above, and I don't really think it helps much with reuse, and is a little complex to setup. To be able to dyn_cast the ValueBoundConstraintSet we'd need to implement LLVM RTTI for it, and then it still only makes sense to use the helper on ScalableValueBoundsConstraintSet to compute the bound. The other variants construct a non-scalable ValueBoundsConstraintSet internally, so would not be useful for computing bounds dependent on vscale.
And like I mentioned before none of the existing bounds functions do exactly what I want here, so I think I'd still need computeScalableBound() (and it's accompanying helpers).
I think it could be done, but I think (right now) all it'd really eliminate is the PopulateCustomValueBoundsFn callback.
Also, in the current implementation ScalableValueBoundsConstraintSet is implementation detail, that's not exposed in any header files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we give it a try? I was mainly worried about adding new API (PopulateCustomValueBoundsFn) to ValueBoundsOpInterface when it is not needed.
Setting up the RTTI should be pretty easy, you can copy this: 6a43523
It would indeed only make sense for ScalableValueBoundsConstraintSet, but I don't see a problem with that.
I don't think
ValueBoundsConstraintSet::computeDependentBound()does exactly what I want here. First, I'm not sure I want to stop at the firstvscalefound, alsocomputeDependentBound()does not project out thevalueDim, which in my experiments gaveaffine_mapsthat take two input symbols (presumably one forvscaleand thevalueDim), which did not usefully describe the bound.Only once I eliminated all variables except
vscaledid I get the bounds I was interested in.
At the moment, stopFunction controls how far the traversal goes and also what SSA values to project out. It looks like you need some more fine-grained control here: computeScalableBound adds constraints for vscale, but it does not project vscale out. I don't really mind having a bit of duplicated functionality in computeScalableBound; as you said it's quite isolated. I can look into adding the missing configuration options in a follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've now reworked things things as suggested above (which has removed the PopulateCustomValueBoundsFn callback).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need all this pattern matching?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's for converting the affine_map to a single quantity. In my mind computeScalableBound() is just like computeConstantBound(), but it supports scalability. So in many cases you can convert the bound to a single (possibly scalable) value (which is simply represented with a int64_t and a scalable flag, like it is in the vector type).
mlir/include/mlir/Dialect/Vector/IR/ScalableValueBoundsConstraintSet.h
Outdated
Show resolved
Hide resolved
mlir/include/mlir/Dialect/Vector/IR/ScalableValueBoundsConstraintSet.h
Outdated
Show resolved
Hide resolved
mlir/include/mlir/Dialect/Vector/IR/ScalableValueBoundsConstraintSet.h
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels like this test functionality belongs to the vector dialect. But then we would also have to duplicate some functionality there. I'll leave it up to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to just leave this here for now, it is slightly awkwardly placed, but still convenient for testing.
banach-space
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for implementing this Ben!
Few more nits, but feel free to ignore - mostly me trying to grasp few more bits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] vscale -> vector.vscale
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I'm referring to the concept of vscale, not the vector dialect operation.
mlir/lib/Dialect/Vector/IR/ScalableValueBoundsConstraintSet.cpp
Outdated
Show resolved
Hide resolved
This adds a new API built with the `ValueBoundsConstraintSet` to compute
the bounds of possibly scalable quantities. It uses knowledge of the
range of vscale (which is defined by the target architecture), to solve
for the bound as either a constant or an expression in terms of vscale.
The result is an `AffineMap` will always take at most one parameter,
vscale, and return a single result, which is the bound of `value`.
The API is defined as follows:
```c++
FailureOr<ConstantOrScalableBound>
vector::computeScalableBound(Value value, std::optional<int64_t> dim,
unsigned vscaleMin, unsigned vscaleMax,
presburger::BoundType boundType);
```
Note: `ConstantOrScalableBound` is a thin wrapper over the `AffineMap`
with a utility for converting the bound to a single quantity (i.e. a
size and scalable flag).
We believe this API could prove useful downstream in IREE (which uses
a similar analysis to hoist allocas, which currently fails for scalable
vectors).
Squashed commits:
Fix `affine_map` in dynamic dim test
Require vscale min/max to be explicit in tests
Rewrite to use RTTI and vscale impl of `ValueBoundsOpInterface`
Expose computeScalableBound() `stopCondition`
Hide ValueBoundsConstraintSet methods on ScalableValueBoundsConstraintSet
|
Thanks a lot for the reviews! Big thanks to Matthias too (as the expert here) :) I'll land this tomorrow if there's no further comments. |
This makes use of recent work (llvm/llvm-project#83876) to support scalability in value bounds analysis. With these changes scalable allocations (i.e. allocations that scale with vscale), can now be hoisted and pass stack limit checking. Hoisting scalable allocations requires knowledge of the range of vscale, which is taken from the target (using the maximum range defined by the ISA). The stack limit checking just assumes `vscale = 1`, since this avoids needing to look at the target (and always resolves to a constant bound), and still prevents excessive allocations.
…83876) This adds a new API built with the `ValueBoundsConstraintSet` to compute the bounds of possibly scalable quantities. It uses knowledge of the range of vscale (which is defined by the target architecture), to solve for the bound as either a constant or an expression in terms of vscale. The result is an `AffineMap` that will always take at most one parameter, vscale, and returns a single result, which is the bound of `value`. The API is defined as follows: ```c++ FailureOr<ConstantOrScalableBound> vector::ScalableValueBoundsConstraintSet::computeScalableBound( Value value, std::optional<int64_t> dim, unsigned vscaleMin, unsigned vscaleMax, presburger::BoundType boundType, bool closedUB = true, StopConditionFn stopCondition = nullptr); ``` Note: `ConstantOrScalableBound` is a thin wrapper over the `AffineMap` with a utility for converting the bound to a single quantity (i.e. a size and scalable flag). We believe this API could prove useful downstream in IREE (which uses a similar analysis to hoist allocas, which currently fails for scalable vectors).
This makes use of recent work (llvm/llvm-project#83876) to support scalability in value bounds analysis. With these changes scalable allocations (i.e. allocations that scale with vscale), can now be hoisted and pass stack limit checking. Hoisting scalable allocations requires knowledge of the range of vscale, which is taken from the target (using the maximum range defined by the ISA). The stack limit checking just assumes `vscale = 1`, since this avoids needing to look at the target (and always resolves to a constant bound), and still prevents excessive allocations.
This makes use of recent work (llvm/llvm-project#83876) to support scalability in value bounds analysis. With these changes scalable allocations (i.e. allocations that scale with vscale), can now be hoisted and pass stack limit checking. Hoisting scalable allocations requires knowledge of the range of vscale, which is taken from the target (using the maximum range defined by the ISA). The stack limit checking just assumes `vscale = 1`, since this avoids needing to look at the target (and always resolves to a constant bound), and still prevents excessive allocations.
This makes use of recent work (llvm/llvm-project#83876) to support scalability in value bounds analysis. With these changes scalable allocations (i.e. allocations that scale with vscale), can now be hoisted and pass stack limit checking. Hoisting scalable allocations requires knowledge of the range of vscale, which is taken from the target (using the maximum range defined by the ISA). The stack limit checking just assumes `vscale = 1`, since this avoids needing to look at the target (and always resolves to a constant bound), and still prevents excessive allocations.
This makes use of recent work (llvm/llvm-project#83876) to support scalability in value bounds analysis. With these changes scalable allocations (i.e. allocations that scale with vscale), can now be hoisted and pass stack limit checking. Hoisting scalable allocations requires knowledge of the range of vscale, which is taken from the target (using the maximum range defined by the ISA). The stack limit checking just assumes `vscale = 1`, since this avoids needing to look at the target (and always resolves to a constant bound), and still prevents excessive allocations. Squashed: Review fixups Review fixups (round 2) CI fix Add `iree-llvmcpu-stack-allocation-assumed-vscale` option This allows changing the value of vscale used when checking if (scalable) allocations are within the stack limit. Internal suggestion from @ banach-space.
This makes use of recent work (llvm/llvm-project#83876) to support scalability in value bounds analysis. With these changes scalable allocations (i.e. allocations that scale with vscale), can now be hoisted and pass stack limit checking. Hoisting scalable allocations requires knowledge of the range of vscale, which is taken from the target (using the maximum range defined by the ISA). The stack limit checking just assumes `vscale = 1`, since this avoids needing to look at the target (and always resolves to a constant bound), and still prevents excessive allocations.
This makes use of recent work (llvm/llvm-project#83876) to support scalability in value bounds analysis. With these changes scalable allocations (i.e. allocations that scale with vscale), can now be hoisted and pass stack limit checking. Hoisting scalable allocations requires knowledge of the range of vscale, which is taken from the target (using the maximum range defined by the ISA). The stack limit checking just assumes `vscale = 1`, since this avoids needing to look at the target (and always resolves to a constant bound), and still prevents excessive allocations.
#16869) This makes use of recent work (llvm/llvm-project#83876) to support scalability in value bounds analysis. With these changes scalable allocations (i.e. allocations that scale with vscale), can now be hoisted and pass stack limit checking. Hoisting scalable allocations requires knowledge of the range of vscale, which is taken from the target (using the maximum range defined by the ISA). The stack limit checking just assumes `vscale = 1` (by default), since this avoids needing to look at the target (and always resolves to a constant bound), and still prevents excessive allocations. ci-extra: build_test_all_arm64
This adds a new API built with the
ValueBoundsConstraintSetto compute the bounds of possibly scalable quantities. It uses knowledge of the range of vscale (which is defined by the target architecture), to solve for the bound as either a constant or an expression in terms of vscale.The result is an
AffineMapthat will always take at most one parameter, vscale, and returns a single result, which is the bound ofvalue.The API is defined as follows:
Note:
ConstantOrScalableBoundis a thin wrapper over theAffineMapwith a utility for converting the bound to a single quantity (i.e. a size and scalable flag).We believe this API could prove useful downstream in IREE (which uses a similar analysis to hoist allocas, which currently fails for scalable vectors).