- 
                Notifications
    You must be signed in to change notification settings 
- Fork 15k
[DA] Add initial support for monotonicity check #162280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| 
 This stack of pull requests is managed by Graphite. Learn more about stacking. | 
611229f    to
    9bfa9d5      
    Compare
  
    | /// Note that we don't check if the step recurrence can be zero. For | ||
| /// example,an AddRec `{0,+,%a}<nsw> is classifed as Monotonic if `%a` can be | ||
| /// zero. That is, the expression can be Invariant. | ||
| MultiSignedMonotonic, | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this name is not good. Please let me know if you have a better one. (it would be better if the name also imply that the step value is loop invariant.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MultivariateMonotonic
For a mathematical (monotonic) function, invariants are just constants that do not appear in the function's domain.
| struct SCEVMonotonicityChecker | ||
| : public SCEVVisitor<SCEVMonotonicityChecker, SCEVMonotonicity> { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for the testability, maybe is it better to split the file, like ScalarEvolutionDivision.cpp? Or would it be better to avoid creating separate files unnecessarily?
| ; if (cond) | ||
| ; a[i + j] = 0; | ||
| ; | ||
| define void @conditional_store0(ptr %a, i64 %n, i64 %m) { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to #159846 (comment), apparently we cannot infer nsw in this case. I still need to check practical cases, it might be serious and the analysis might be too poor...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just looking at the pseudo C and your comment above: Something like this might be a test case for SCEV, but not for DA. In DA I believe you should just close your eyes and assume any nsw/nuw flag given to you is correct.
(EDIT: of course it is good to have a testcase, that checks when nsw/nuw doesn't exist, we do not prove monotonicity)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As your EDIT says, this is a case for SCEV cannot infer nsw so we cannot prove monotonicity, not one where we don't trust the given nsw flag. The purpose of this test is to share the limitations of the analysis capabilities.
| @@ -0,0 +1,459 @@ | |||
| ; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py UTC_ARGS: --version 6 | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, I've added all the cases I could think of. Please let me know if there's anything else you're interested in.
| @llvm/pr-subscribers-llvm-analysis Author: Ryotaro Kasuga (kasuga-fj) ChangesThe dependence testing functions in DA assume that the analyzed AddRec does not wrap over the entire iteration space. This means that DA cannot analyze AddRecs that may wrap, and should conservatively return Unknown dependence for such cases. However, no validation is currently performed to ensure that this condition holds, which can lead to incorrect results in some cases. This patch introduces the notion of monotonicity and a validation logic to check whether an AddRec is monotonic. The monotonicity check classifies the subscript of a memory access into one of the following categories: 
 The current validation logic basically searches an AddRec recursively and checks whether the  Split off from #154527. Patch is 38.32 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/162280.diff 4 Files Affected: 
 diff --git a/llvm/lib/Analysis/DependenceAnalysis.cpp b/llvm/lib/Analysis/DependenceAnalysis.cpp
index 1f0da8d1830d3..a3134f8571481 100644
--- a/llvm/lib/Analysis/DependenceAnalysis.cpp
+++ b/llvm/lib/Analysis/DependenceAnalysis.cpp
@@ -128,6 +128,18 @@ static cl::opt<bool> RunSIVRoutinesOnly(
              "The purpose is mainly to exclude the influence of those routines "
              "in regression tests for SIV routines."));
 
+// TODO: This flag is disabled by default because it is still under development.
+// Enable it or delete this flag when the feature is ready.
+static cl::opt<bool> EnableMonotonicityCheck(
+    "da-enable-monotonicity-check", cl::init(false), cl::Hidden,
+    cl::desc("Check if the subscripts are monotonic. If it's not, dependence "
+             "is reported as unknown."));
+
+static cl::opt<bool> DumpMonotonicityReport(
+    "da-dump-monotonicity-report", cl::init(false), cl::Hidden,
+    cl::desc(
+        "When printing analysis, dump the results of monotonicity checks."));
+
 //===----------------------------------------------------------------------===//
 // basics
 
@@ -177,13 +189,189 @@ void DependenceAnalysisWrapperPass::getAnalysisUsage(AnalysisUsage &AU) const {
   AU.addRequiredTransitive<LoopInfoWrapperPass>();
 }
 
+namespace {
+
+/// The type of monotonicity of a SCEV. This property is defined with respect to
+/// the outermost loop that DA is analyzing.
+///
+/// This is designed to classify the behavior of AddRec expressions, and does
+/// not care about other SCEVs. For example, given the two loop-invariant values
+/// `A` and `B`, `A + B` is treated as Invariant even if the addition wraps.
+enum class SCEVMonotonicityType {
+  /// The expression is neither loop-invariant nor monotonic (or we fail to
+  /// prove it).
+  Unknown,
+
+  /// The expression is loop-invariant with respect to the outermost loop.
+  Invariant,
+
+  /// The expression is a (nested) affine AddRec and is monotonically increasing
+  /// or decreasing in a signed sense with respect to each loop. Monotonicity is
+  /// checked independently for each loop, and the expression is classified as
+  /// MultiSignedMonotonic if all AddRecs are nsw. For example, in the following
+  /// loop:
+  ///
+  ///   for (i = 0; i < 100; i++)
+  ///     for (j = 0; j < 100; j++)
+  ///       A[i + j] = ...;
+  ///
+  /// The SCEV for `i + j` is classified as MultiSignedMonotonic. On the other
+  /// hand, in the following loop:
+  ///
+  ///   for (i = 0; i < 100; i++)
+  ///     for (j = 0; j <= (1ULL << 63); j++)
+  ///       A[i + j] = ...;
+  ///
+  /// The SCEV for `i + j` is NOT classified as MultiMonotonic, because the
+  /// AddRec for `j` wraps in a signed sense. We don't consider the "direction"
+  /// of each AddRec. For example, in the following loop:
+  ///
+  ///  for (int i = 0; i < 100; i++)
+  ///    for (int j = 0; j < 100; j++)
+  ///      A[i - j] = ...;
+  ///
+  /// The SCEV for `i - j` is classified as MultiSignedMonotonic, even though it
+  /// contains both increasing and decreasing AddRecs.
+  ///
+  /// Note that we don't check if the step recurrence can be zero. For
+  /// example,an AddRec `{0,+,%a}<nsw> is classifed as Monotonic if `%a` can be
+  /// zero. That is, the expression can be Invariant.
+  MultiSignedMonotonic,
+};
+
+struct SCEVMonotonicity {
+  SCEVMonotonicity(SCEVMonotonicityType Type,
+                   const SCEV *FailurePoint = nullptr);
+
+  SCEVMonotonicityType getType() const { return Type; }
+
+  const SCEV *getFailurePoint() const { return FailurePoint; }
+
+  bool isUnknown() const { return Type == SCEVMonotonicityType::Unknown; }
+
+  void print(raw_ostream &OS, unsigned Depth) const;
+
+private:
+  SCEVMonotonicityType Type;
+
+  /// The subexpression that caused Unknown. Mainly for debugging purpose.
+  const SCEV *FailurePoint;
+};
+
+struct SCEVMonotonicityChecker
+    : public SCEVVisitor<SCEVMonotonicityChecker, SCEVMonotonicity> {
+
+  SCEVMonotonicityChecker(ScalarEvolution *SE) : SE(SE) {}
+
+  /// Check the monotonicity of \p Expr. \p Expr must be integer type. If \p
+  /// OutermostLoop is not null, \p Expr must be defined in \p OutermostLoop or
+  /// one of its nested loops.
+  SCEVMonotonicity checkMonotonicity(const SCEV *Expr,
+                                     const Loop *OutermostLoop);
+
+private:
+  ScalarEvolution *SE;
+
+  /// The outermost loop that DA is analyzing.
+  const Loop *OutermostLoop;
+
+  /// A helper to classify \p Expr as either Invariant or Unknown.
+  SCEVMonotonicity invariantOrUnknown(const SCEV *Expr);
+
+  /// Return true if \p Expr is loop-invariant with respect to the outermost
+  /// loop.
+  bool isLoopInvariant(const SCEV *Expr) const;
+
+  /// A helper to create an Unknown SCEVMonotonicity.
+  SCEVMonotonicity createUnknown(const SCEV *FailurePoint) {
+    return SCEVMonotonicity(SCEVMonotonicityType::Unknown, FailurePoint);
+  }
+
+  SCEVMonotonicity visitAddRecExpr(const SCEVAddRecExpr *Expr);
+
+  SCEVMonotonicity visitConstant(const SCEVConstant *) {
+    return SCEVMonotonicity(SCEVMonotonicityType::Invariant);
+  }
+  SCEVMonotonicity visitVScale(const SCEVVScale *) {
+    return SCEVMonotonicity(SCEVMonotonicityType::Invariant);
+  }
+
+  // TODO: Handle more cases.
+  SCEVMonotonicity visitZeroExtendExpr(const SCEVZeroExtendExpr *Expr) {
+    return invariantOrUnknown(Expr);
+  }
+  SCEVMonotonicity visitSignExtendExpr(const SCEVSignExtendExpr *Expr) {
+    return invariantOrUnknown(Expr);
+  }
+  SCEVMonotonicity visitAddExpr(const SCEVAddExpr *Expr) {
+    return invariantOrUnknown(Expr);
+  }
+  SCEVMonotonicity visitMulExpr(const SCEVMulExpr *Expr) {
+    return invariantOrUnknown(Expr);
+  }
+  SCEVMonotonicity visitPtrToIntExpr(const SCEVPtrToIntExpr *Expr) {
+    return invariantOrUnknown(Expr);
+  }
+  SCEVMonotonicity visitTruncateExpr(const SCEVTruncateExpr *Expr) {
+    return invariantOrUnknown(Expr);
+  }
+  SCEVMonotonicity visitUDivExpr(const SCEVUDivExpr *Expr) {
+    return invariantOrUnknown(Expr);
+  }
+  SCEVMonotonicity visitSMaxExpr(const SCEVSMaxExpr *Expr) {
+    return invariantOrUnknown(Expr);
+  }
+  SCEVMonotonicity visitUMaxExpr(const SCEVUMaxExpr *Expr) {
+    return invariantOrUnknown(Expr);
+  }
+  SCEVMonotonicity visitSMinExpr(const SCEVSMinExpr *Expr) {
+    return invariantOrUnknown(Expr);
+  }
+  SCEVMonotonicity visitUMinExpr(const SCEVUMinExpr *Expr) {
+    return invariantOrUnknown(Expr);
+  }
+  SCEVMonotonicity visitSequentialUMinExpr(const SCEVSequentialUMinExpr *Expr) {
+    return invariantOrUnknown(Expr);
+  }
+  SCEVMonotonicity visitUnknown(const SCEVUnknown *Expr) {
+    return invariantOrUnknown(Expr);
+  }
+  SCEVMonotonicity visitCouldNotCompute(const SCEVCouldNotCompute *Expr) {
+    return invariantOrUnknown(Expr);
+  }
+
+  friend struct SCEVVisitor<SCEVMonotonicityChecker, SCEVMonotonicity>;
+};
+
+} // anonymous namespace
+
 // Used to test the dependence analyzer.
 // Looks through the function, noting instructions that may access memory.
 // Calls depends() on every possible pair and prints out the result.
 // Ignores all other instructions.
 static void dumpExampleDependence(raw_ostream &OS, DependenceInfo *DA,
-                                  ScalarEvolution &SE, bool NormalizeResults) {
+                                  ScalarEvolution &SE, LoopInfo &LI,
+                                  bool NormalizeResults) {
   auto *F = DA->getFunction();
+
+  if (DumpMonotonicityReport) {
+    SCEVMonotonicityChecker Checker(&SE);
+    OS << "Monotonicity check:\n";
+    for (Instruction &Inst : instructions(F)) {
+      if (!isa<LoadInst>(Inst) && !isa<StoreInst>(Inst))
+        continue;
+      Value *Ptr = getLoadStorePointerOperand(&Inst);
+      const Loop *L = LI.getLoopFor(Inst.getParent());
+      const SCEV *PtrSCEV = SE.getSCEVAtScope(Ptr, L);
+      const SCEV *AccessFn = SE.removePointerBase(PtrSCEV);
+      SCEVMonotonicity Mon = Checker.checkMonotonicity(AccessFn, L);
+      OS.indent(2) << "Inst: " << Inst << "\n";
+      OS.indent(4) << "Expr: " << *AccessFn << "\n";
+      Mon.print(OS, 4);
+    }
+    OS << "\n";
+  }
+
   for (inst_iterator SrcI = inst_begin(F), SrcE = inst_end(F); SrcI != SrcE;
        ++SrcI) {
     if (SrcI->mayReadOrWriteMemory()) {
@@ -235,7 +423,8 @@ static void dumpExampleDependence(raw_ostream &OS, DependenceInfo *DA,
 void DependenceAnalysisWrapperPass::print(raw_ostream &OS,
                                           const Module *) const {
   dumpExampleDependence(
-      OS, info.get(), getAnalysis<ScalarEvolutionWrapperPass>().getSE(), false);
+      OS, info.get(), getAnalysis<ScalarEvolutionWrapperPass>().getSE(),
+      getAnalysis<LoopInfoWrapperPass>().getLoopInfo(), false);
 }
 
 PreservedAnalyses
@@ -244,7 +433,7 @@ DependenceAnalysisPrinterPass::run(Function &F, FunctionAnalysisManager &FAM) {
      << "':\n";
   dumpExampleDependence(OS, &FAM.getResult<DependenceAnalysis>(F),
                         FAM.getResult<ScalarEvolutionAnalysis>(F),
-                        NormalizeResults);
+                        FAM.getResult<LoopAnalysis>(F), NormalizeResults);
   return PreservedAnalyses::all();
 }
 
@@ -670,6 +859,70 @@ bool DependenceInfo::intersectConstraints(Constraint *X, const Constraint *Y) {
   return false;
 }
 
+//===----------------------------------------------------------------------===//
+// SCEVMonotonicity
+
+SCEVMonotonicity::SCEVMonotonicity(SCEVMonotonicityType Type,
+                                   const SCEV *FailurePoint)
+    : Type(Type), FailurePoint(FailurePoint) {
+  assert(
+      ((Type == SCEVMonotonicityType::Unknown) == (FailurePoint != nullptr)) &&
+      "FailurePoint must be provided iff Type is Unknown");
+}
+
+void SCEVMonotonicity::print(raw_ostream &OS, unsigned Depth) const {
+  OS.indent(Depth) << "Monotonicity: ";
+  switch (Type) {
+  case SCEVMonotonicityType::Unknown:
+    assert(FailurePoint && "FailurePoint must be provided for Unknown");
+    OS << "Unknown\n";
+    OS.indent(Depth) << "Reason: " << *FailurePoint << "\n";
+    break;
+  case SCEVMonotonicityType::Invariant:
+    OS << "Invariant\n";
+    break;
+  case SCEVMonotonicityType::MultiSignedMonotonic:
+    OS << "MultiSignedMonotonic\n";
+    break;
+  }
+}
+
+bool SCEVMonotonicityChecker::isLoopInvariant(const SCEV *Expr) const {
+  return !OutermostLoop || SE->isLoopInvariant(Expr, OutermostLoop);
+}
+
+SCEVMonotonicity SCEVMonotonicityChecker::invariantOrUnknown(const SCEV *Expr) {
+  if (isLoopInvariant(Expr))
+    return SCEVMonotonicity(SCEVMonotonicityType::Invariant);
+  return createUnknown(Expr);
+}
+
+SCEVMonotonicity
+SCEVMonotonicityChecker::checkMonotonicity(const SCEV *Expr,
+                                           const Loop *OutermostLoop) {
+  assert(Expr->getType()->isIntegerTy() && "Expr must be integer type");
+  this->OutermostLoop = OutermostLoop;
+  return visit(Expr);
+}
+
+SCEVMonotonicity
+SCEVMonotonicityChecker::visitAddRecExpr(const SCEVAddRecExpr *Expr) {
+  if (!Expr->isAffine() || !Expr->hasNoSignedWrap())
+    return createUnknown(Expr);
+
+  const SCEV *Start = Expr->getStart();
+  const SCEV *Step = Expr->getStepRecurrence(*SE);
+
+  SCEVMonotonicity StartMon = visit(Start);
+  if (StartMon.isUnknown())
+    return StartMon;
+
+  if (!isLoopInvariant(Step))
+    return createUnknown(Expr);
+
+  return SCEVMonotonicity(SCEVMonotonicityType::MultiSignedMonotonic);
+}
+
 //===----------------------------------------------------------------------===//
 // DependenceInfo methods
 
@@ -3479,10 +3732,19 @@ bool DependenceInfo::tryDelinearize(Instruction *Src, Instruction *Dst,
   // resize Pair to contain as many pairs of subscripts as the delinearization
   // has found, and then initialize the pairs following the delinearization.
   Pair.resize(Size);
+  SCEVMonotonicityChecker MonChecker(SE);
+  const Loop *OutermostLoop = SrcLoop ? SrcLoop->getOutermostLoop() : nullptr;
   for (int I = 0; I < Size; ++I) {
     Pair[I].Src = SrcSubscripts[I];
     Pair[I].Dst = DstSubscripts[I];
     unifySubscriptType(&Pair[I]);
+
+    if (EnableMonotonicityCheck) {
+      if (MonChecker.checkMonotonicity(Pair[I].Src, OutermostLoop).isUnknown())
+        return false;
+      if (MonChecker.checkMonotonicity(Pair[I].Dst, OutermostLoop).isUnknown())
+        return false;
+    }
   }
 
   return true;
@@ -3815,6 +4077,14 @@ DependenceInfo::depends(Instruction *Src, Instruction *Dst,
   Pair[0].Src = SrcEv;
   Pair[0].Dst = DstEv;
 
+  SCEVMonotonicityChecker MonChecker(SE);
+  const Loop *OutermostLoop = SrcLoop ? SrcLoop->getOutermostLoop() : nullptr;
+  if (EnableMonotonicityCheck)
+    if (MonChecker.checkMonotonicity(Pair[0].Src, OutermostLoop).isUnknown() ||
+        MonChecker.checkMonotonicity(Pair[0].Dst, OutermostLoop).isUnknown())
+      return std::make_unique<Dependence>(Src, Dst,
+                                          SCEVUnionPredicate(Assume, *SE));
+
   if (Delinearize) {
     if (tryDelinearize(Src, Dst, Pair)) {
       LLVM_DEBUG(dbgs() << "    delinearized\n");
diff --git a/llvm/test/Analysis/DependenceAnalysis/monotonicity-cast.ll b/llvm/test/Analysis/DependenceAnalysis/monotonicity-cast.ll
new file mode 100644
index 0000000000000..7a72755bcaf2f
--- /dev/null
+++ b/llvm/test/Analysis/DependenceAnalysis/monotonicity-cast.ll
@@ -0,0 +1,174 @@
+; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py UTC_ARGS: --version 6
+; RUN: opt < %s -disable-output -passes="print<da>" -da-dump-monotonicity-report \
+; RUN:     -da-enable-monotonicity-check 2>&1 | FileCheck %s
+
+; int8_t offset = start;
+; for (int i = 0; i < 100; i++, offset += step)
+;   a[sext(offset)] = 0;
+;
+define void @sext_nsw(ptr %a, i8 %start, i8 %step) {
+; CHECK-LABEL: 'sext_nsw'
+; CHECK-NEXT:  Monotonicity check:
+; CHECK-NEXT:    Inst: store i8 0, ptr %idx, align 1
+; CHECK-NEXT:      Expr: {(sext i8 %start to i64),+,(sext i8 %step to i64)}<nsw><%loop>
+; CHECK-NEXT:      Monotonicity: MultiSignedMonotonic
+; CHECK-EMPTY:
+; CHECK-NEXT:  Src: store i8 0, ptr %idx, align 1 --> Dst: store i8 0, ptr %idx, align 1
+; CHECK-NEXT:    da analyze - none!
+;
+entry:
+  br label %loop
+
+loop:
+  %i = phi i64 [ 0, %entry ], [ %i.inc, %loop ]
+  %offset = phi i8 [ %start, %entry ], [ %offset.next, %loop ]
+  %offset.sext = sext i8 %offset to i64
+  %idx = getelementptr i8, ptr %a, i64 %offset.sext
+  store i8 0, ptr %idx
+  %i.inc = add nsw i64 %i, 1
+  %offset.next = add nsw i8 %offset, %step
+  %exitcond = icmp eq i64 %i.inc, 100
+  br i1 %exitcond, label %exit, label %loop
+
+exit:
+  ret void
+}
+
+; The addition for `%offset.next` can wrap, so we cannot prove monotonicity.
+;
+; int8_t offset = start;
+; for (int i = 0; i < 100; i++, offset += step)
+;   a[sext(offset)] = 0;
+;
+define void @sext_may_wrap(ptr %a, i8 %start, i8 %step) {
+; CHECK-LABEL: 'sext_may_wrap'
+; CHECK-NEXT:  Monotonicity check:
+; CHECK-NEXT:    Inst: store i8 0, ptr %idx, align 1
+; CHECK-NEXT:      Expr: (sext i8 {%start,+,%step}<%loop> to i64)
+; CHECK-NEXT:      Monotonicity: Unknown
+; CHECK-NEXT:      Reason: (sext i8 {%start,+,%step}<%loop> to i64)
+; CHECK-EMPTY:
+; CHECK-NEXT:  Src: store i8 0, ptr %idx, align 1 --> Dst: store i8 0, ptr %idx, align 1
+; CHECK-NEXT:    da analyze - confused!
+;
+entry:
+  br label %loop
+
+loop:
+  %i = phi i64 [ 0, %entry ], [ %i.inc, %loop ]
+  %offset = phi i8 [ %start, %entry ], [ %offset.next, %loop ]
+  %offset.sext = sext i8 %offset to i64
+  %idx = getelementptr i8, ptr %a, i64 %offset.sext
+  store i8 0, ptr %idx
+  %i.inc = add nsw i64 %i, 1
+  %offset.next = add i8 %offset, %step
+  %exitcond = icmp eq i64 %i.inc, 100
+  br i1 %exitcond, label %exit, label %loop
+
+exit:
+  ret void
+}
+
+; for (int8_t i = 0; i < 100; i++)
+;   a[zext(offset)] = 0;
+;
+define void @zext_pos(ptr %a) {
+; CHECK-LABEL: 'zext_pos'
+; CHECK-NEXT:  Monotonicity check:
+; CHECK-NEXT:    Inst: store i8 0, ptr %idx, align 1
+; CHECK-NEXT:      Expr: {0,+,1}<nuw><nsw><%loop>
+; CHECK-NEXT:      Monotonicity: MultiSignedMonotonic
+; CHECK-EMPTY:
+; CHECK-NEXT:  Src: store i8 0, ptr %idx, align 1 --> Dst: store i8 0, ptr %idx, align 1
+; CHECK-NEXT:    da analyze - none!
+;
+entry:
+  br label %loop
+
+loop:
+  %i = phi i8 [ 0, %entry ], [ %i.inc, %loop ]
+  %offset.zext = zext nneg i8 %i to i64
+  %idx = getelementptr i8, ptr %a, i64 %offset.zext
+  store i8 0, ptr %idx
+  %i.inc = add nsw i8 %i, 1
+  %exitcond = icmp eq i8 %i.inc, 100
+  br i1 %exitcond, label %exit, label %loop
+
+exit:
+  ret void
+}
+
+; The zero-extened value of `offset` is no longer monotonic. In fact, the
+; values of `offset` in each iteration are:
+;
+;    iteration |   0 | 1 | 2 | ...
+; -------------|-----|---|---|---------
+;       offset |  -1 | 0 | 1 | ...
+; zext(offset) | 255 | 0 | 1 | ...
+;
+;
+; for (int8_t i = -1; i < 100; i++)
+;   a[zext(offset)] = 0;
+;
+define void @zext_cross_zero(ptr %a) {
+; CHECK-LABEL: 'zext_cross_zero'
+; CHECK-NEXT:  Monotonicity check:
+; CHECK-NEXT:    Inst: store i8 0, ptr %idx, align 1
+; CHECK-NEXT:      Expr: (zext i8 {-1,+,1}<nsw><%loop> to i64)
+; CHECK-NEXT:      Monotonicity: Unknown
+; CHECK-NEXT:      Reason: (zext i8 {-1,+,1}<nsw><%loop> to i64)
+; CHECK-EMPTY:
+; CHECK-NEXT:  Src: store i8 0, ptr %idx, align 1 --> Dst: store i8 0, ptr %idx, align 1
+; CHECK-NEXT:    da analyze - confused!
+;
+entry:
+  br label %loop
+
+loop:
+  %i = phi i8 [ -1, %entry ], [ %i.inc, %loop ]
+  %offset.zext = zext nneg i8 %i to i64
+  %idx = getelementptr i8, ptr %a, i64 %offset.zext
+  store i8 0, ptr %idx
+  %i.inc = add nsw i8 %i, 1
+  %exitcond = icmp eq i8 %i.inc, 100
+  br i1 %exitcond, label %exit, label %loop
+
+exit:
+  ret void
+}
+
+; In principle, we can prove that `zext(offset)` is monotonic since we know
+; that `offset` is non-negative.
+;
+; int8_t offset = 0;
+; for (int i = 0; i < 100; i++, offset += step)
+;   a[zext(offset)] = 0;
+;
+define void @zext_nneg_nsw(ptr %a, i8 %step) {
+; CHECK-LABEL: 'zext_nneg_nsw'
+; CHECK-NEXT:  Monotonicity check:
+; CHECK-NEXT:    Inst: store i8 0, ptr %idx, align 1
+; CHECK-NEXT:      Expr: (zext i8 {0,+,%step}<nsw><%loop> to i64)
+; CHECK-NEXT:      Monotonicity: Unknown
+; CHECK-NEXT:      Reason: (zext i8 {0,+,%step}<nsw><%loop> to i64)
+; CHECK-EMPTY:
+; CHECK-NEXT:  Src: store i8 0, ptr %idx, align 1 --> Dst: store i8 0, ptr %idx, align 1
+; CHECK-NEXT:    da analyze - confused!
+;
+entry:
+  br label %loop
+
+loop:
+  %i = phi i64 [ 0, %entry ], [ %i.inc, %loop ]
+  %offset = phi i8 [ 0, %entry ], [ %offset.next, %loop ]
+  %offset.zext = zext nneg i8 %offset to i64
+  %idx = getelementptr i8, ptr %a, i64 %offset.zext
+  store i8 0, ptr %idx
+  %i.inc = add nsw i64 %i, 1
+  %offset.next = add nsw i8 %offset, %step
+  %exitcond = icmp eq i64 %i.inc, 100
+  br i1 %exitcond, label %exit, label %loop
+
+exit:
+  ret void
+}
diff --git a/llvm/test/Analysis/DependenceAnalysis/monotonicity-invariant.ll b/llvm/test/Analysis/DependenceAnalysis/monotonicity-invariant.ll
new file mode 100644
index 0000000000000..8f45dfa3af5dd
--- /dev/null
+++ b/llvm/test/Analysis/DependenceAnalysis/monotonicity-invariant.ll
@@ -0,0 +1,150 @@
+; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py UTC_ARGS: --version 6
+; RUN: opt < %s -disable-output -passes="print<da>" -da-dump-monotonicity-report \
+; RUN:     -da-enable-monotonicity-check 2>&1 | FileCheck %s
+
+; for (int i = 0; i < n; i++)
+;   a[x] = 0;
+define void @single_loop_invariant(ptr %a, i64 %x, i64 %n) {
+; CHECK-LABEL: 'single_loop_invariant'
+; CHECK-NEXT:  Monotonicity check:
+; CHECK-NEXT:    Inst: store i8 0, ptr %idx, align 1
+; CHECK-NEXT:      Expr: %x
+; CHECK-NEXT:      Monotonicity: Invariant
+; CHECK-EMPTY:
+; CHECK-NEXT:  Src: store i8 0, ptr %idx, align 1 --> Dst: store i8 0, ptr %idx, align 1
+; CHECK-NEXT:    da analyze - consistent output [S]!
+;
+entry:
+  %guard = icmp sgt i64 %n, 0
+  br i1 %guard, label %loop, label %exit
+
+loop:
+  %i = phi i64 [ 0, %entry ], [ %i.inc, %loop ]
+  %idx = getelementptr inbounds i8, ptr %a, i64 %x
+  store i8 0, ptr %idx
+  %i.inc = add nsw i64 %i, 1
+  %exitcond = icmp eq i64 %i.inc, %n
+  br i1 %exitcond, label %exit, label %loop
+
+exit:
+  ret void
+}
+
+; for (int i = 0; i < n; i++)
+;   a[(i % 2 == 0 ? x : y)] = 0;
+define void @single_loop_variant(ptr %a, i64 %x, i64 %y, i64 %n) {
+; CHECK-LABEL: 'single_loop_variant'
+; CHECK-...
[truncated]
 | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will start looking into this tomorrow, and would like to do a proper review of the main algorithm. Please bear with me as I may be interrupted by other stuff. Thanks a lot.
| if (EnableMonotonicityCheck) | ||
| if (MonChecker.checkMonotonicity(Pair[0].Src, OutermostLoop).isUnknown() || | ||
| MonChecker.checkMonotonicity(Pair[0].Dst, OutermostLoop).isUnknown()) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a basic question about these two tests here: If we have an AddRec with a nsw flag, that means this AddRec doesn't wrap. Why that is not enough and we need to recursively check each component of AddRec?
I guess the flags from SCEV assume all the internal components are fixed and only the top level calculation doesn't overflow? Is that correct?
In that case you may want to have a testcase where the top level AddRec has nsw, but monotonicity fails. I didn't see that in your test, but in other test files we have examples of that. It is helpful to add that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However the example that I see is this loop (the first test in SameSDLoops.ll)
;;  for (long int i = 0; i < 10; i++) {
;;    for (long int j = 0; j < 10; j++) {
;;      for (long int k = 0; k < 10; k++) {
;;        for (long int l = 0; l < 10; l++)
;;          A[i][j][k][l] = i;
;;      }
;;      for (long int k = 1; k < 11; k++) {
;;        for (long int l = 0; l < 10; l++)
;;          A[i + 4][j + 3][k + 2][l + 1] = l;
It is strange that we cannot prove monotonicity here:
Printing analysis 'Dependence Analysis' for function 'samebd0':
Monotonicity check:
  Inst:   store i64 %i.013, ptr %arrayidx12, align 8
    Expr: {{{{0,+,8000000}<nuw><nsw><%for.cond1.preheader>,+,80000}<nuw><nsw><%for.cond4.preheader>,+,800}<nuw><nsw><%for.cond7.preheader>,+,8}<nuw><nsw><%for.body9>
    Monotonicity: MultiSignedMonotonic
  Inst:   store i64 %l17.04, ptr %arrayidx24, align 8
    Expr: {{{{32242408,+,8000000}<nuw><nsw><%for.cond1.preheader>,+,80000}<nw><%for.cond4.preheader>,+,800}<nuw><nsw><%for.cond18.preheader>,+,8}<nuw><nsw><%for.body20>
    Monotonicity: Unknown
    Reason: {{32242408,+,8000000}<nuw><nsw><%for.cond1.preheader>,+,80000}<nw><%for.cond4.preheader>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a basic question about these two tests here: If we have an AddRec with a nsw flag, that means this AddRec doesn't wrap. Why that is not enough and we need to recursively check each component of AddRec?
I guess the flags from SCEV assume all the internal components are fixed and only the top level calculation doesn't overflow? Is that correct?
In my understanding, your guess is correct. I added a test case @outer_loop_may_wrap, which I believe demonstrates the scenario where only the outer addrec is guaranteed not to wrap.
However the example that I see is this loop (the first test in SameSDLoops.ll)
;; for (long int i = 0; i < 10; i++) { ;; for (long int j = 0; j < 10; j++) { ;; for (long int k = 0; k < 10; k++) { ;; for (long int l = 0; l < 10; l++) ;; A[i][j][k][l] = i; ;; } ;; for (long int k = 1; k < 11; k++) { ;; for (long int l = 0; l < 10; l++) ;; A[i + 4][j + 3][k + 2][l + 1] = l;It is strange that we cannot prove monotonicity here:
Printing analysis 'Dependence Analysis' for function 'samebd0': Monotonicity check: Inst: store i64 %i.013, ptr %arrayidx12, align 8 Expr: {{{{0,+,8000000}<nuw><nsw><%for.cond1.preheader>,+,80000}<nuw><nsw><%for.cond4.preheader>,+,800}<nuw><nsw><%for.cond7.preheader>,+,8}<nuw><nsw><%for.body9> Monotonicity: MultiSignedMonotonic Inst: store i64 %l17.04, ptr %arrayidx24, align 8 Expr: {{{{32242408,+,8000000}<nuw><nsw><%for.cond1.preheader>,+,80000}<nw><%for.cond4.preheader>,+,800}<nuw><nsw><%for.cond18.preheader>,+,8}<nuw><nsw><%for.body20> Monotonicity: Unknown Reason: {{32242408,+,8000000}<nuw><nsw><%for.cond1.preheader>,+,80000}<nw><%for.cond4.preheader>
I don't know much about how nowrap flags are transferred from IR to SCEV, but this appears to be a limitation of SCEV. At a glance, it’s not obvious that the second store  A[i + 4][j + 3][k + 2][l + 1] = l is always executed when entering the j-loop. This may be the reason why the nowrap flags for %for.cond4.preheader are not preserved in SCEV.
Anyway, for this specific case, I think we could perform additional cheap analysis similar to range analysis in SCEV, since all values except the induction variables are constants. That said, I'm not planning to include such a feature in this PR.
| just an fyi: I also started looking into this, but need a bit of time to get up to speed with this. | 
| if (EnableMonotonicityCheck) { | ||
| if (MonChecker.checkMonotonicity(Pair[I].Src, OutermostLoop).isUnknown()) | ||
| return false; | ||
| if (MonChecker.checkMonotonicity(Pair[I].Dst, OutermostLoop).isUnknown()) | ||
| return false; | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another question here, otherwise LGTM.
If we have mutliple subscripts and all of them are monotonic, how could the other monotonicity check (line 4083-4) fail? We need to answer this to make sure we are not running the test redundantly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider the following case (godbolt):
; char A[][32];
; for (i = 0; i < 1ll << 62; i++)
;   for (j = 0; j < 32; j++)
;     if (i < (1ll << 57))
;       A[i][j] = 0;
define void @outer_loop_may_wrap(ptr %a) {
entry:
  br label %loop.i.header
loop.i.header:
  %i = phi i64 [ 0, %entry ], [ %i.inc, %loop.i.latch ]
  br label %loop.j.header
loop.j.header:
  %j = phi i64 [ 0, %loop.i.header ], [ %j.inc, %loop.j.latch ]
  %cond = icmp slt i64 %i, 144115188075855872  ; 2^57
  br i1 %cond, label %if.then, label %loop.j.latch
if.then:
  %gep = getelementptr inbounds [32 x i8], ptr %a, i64 %i, i64 %j
  store i8 0, ptr %gep
  br label %loop.j.latch
loop.j.latch:
  %j.inc = add nuw nsw i64 %j, 1
  %ec.j = icmp eq i64 %j.inc, 32
  br i1 %ec.j, label %loop.i.latch, label %loop.j.header
loop.i.latch:
  %i.inc = add nuw nsw i64 %i, 1
  %ec.i = icmp eq i64 %i.inc, 4611686018427387904  ; 2^62
  br i1 %ec.i, label %exit, label %loop.i.header
exit:
  ret void
}The subscripts {0,+,1}<nuw><nsw><%loop.i.header> and {0,+,1}<nuw><nsw><%loop.j.header> are monotonic, but the original offset {{0,+,32}<%loop.i.header>,+,1}<nw><%loop.j.header> is not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the above test case.
| I went through other and older merge request and issues to see how we got here, and I think I am now mostly up to speed. Just wanted to share some high level observations first before I look more into the details. I think it all started with #148435 and the observation that and an access with  The idea of monoticity to capture the behaviour that we are not iterating through the whole iteration space again and again makes perfect sense. At this point, the following is also unclear to me: 
 I.e., I do not see how unsigned monotonicity is going to be different, but I need to think a bit more about this. Related to this, I am also not in love with the name  Having looked into this better now, I actually do agree that it is a bit questionable whether this belongs in SCEV or here in DA. I think we are describing properties of a SCEV, so SCEV is the first thing that would come to my mind, but I am also okay with continuing here and to develop this in-tree while it is off by default. It looks all self-contained and easy to move if we wanted to do this later. Going to look a bit deeper into this now, but wanted to leave this nit while I am doing that: I liked your example in #148435 (comment) to understand this. If I am not mistaken, you didn't include it here, but may be interesting to add to your tests. | 
| 
 I just added that case. I'll follow up with more detailed comments shortly. | 
| /// The expression is a (nested) affine AddRec and is monotonically increasing | ||
| /// or decreasing in a signed sense with respect to each loop. Monotonicity is | ||
| /// checked independently for each loop, and the expression is classified as | ||
| /// MultiSignedMonotonic if all AddRecs are nsw. For example, in the following | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I brought up monotonicity, I did not mean to apply it to only AddRec expression (which already has a FlagNW property), but to SCEVs in general. An expression could overflow, while none of its operands does. For instance
for (int i = 0; i < INT_MAX; ++i)
  A[i + 2];Additions usually get folded into the SCEVAddRecExpr, but others are not, such as UDiv, URem, UMax, UMin, ... . That is
for (int i = 0; i < INT_MAX; ++i)
  A[i/2];is monotonic
It is fine if we want to only handle outermost-AddRecs at first, but the comments implies this is about AddRec expressions only. The goal was to ensure that the closed range [Expr(first-iteration), Expr(last-iteration)] (or [Expr(last-iteration), Expr(first-iteration)]) describes the entire range of the expression, i.e. there is no i for which Expr(i) that falls outside that range. The name "monotonic" came from because last-iteration could be any iteration (SCEV does not know when the loop will actually terminate), and the range therefore be the range of values Expr could evaluate so far, leading to a (not necessarily strictly) (increasing or decreasing) monotonic function.
For multiple iteration variables, the range could be [Expr(i_first, j_first) .. Expr(i_last, j_last)] (or [Expr(i_last, j_last) .. Expr(i_first, j_first)]), or the combinatorial
ExtremeValueList = {Expr(i_first, j_first), Expr(i_first, j_last), Expr(i_last, j_first), Expr(i_last, j_last)};
Range := [min(ExtremeValueList), max(ExtremeValueList)]`. 
According to the definition of MultiSignedMonotonic, it would be the latter. In my reading of https://en.wikipedia.org/wiki/Monotonic_function, functional analysis would use the first, topology the lastter definition.
If on the other side you understand SCEVMonotonicityType as the same as FlagNSW, but taking for taking all loops of the nest into account, not just the one store in the SCEVAddRecExpr), I would suggest to not call that property "monotonicity".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At first, I was thinking of handling various cases like as you mentioned, but now I think it's fine to focus on AddRecs for the time being. I don't have a strong preference, so changing the name seems reasonable to me (it's also a bit questionable that Invariant is included in the SCEVMonotonicityType in the first place, as you said in #162280 (comment)).
I'm thinking of renaming it something like MultivariateWrapType. Since I'm not good at naming, I'm happy if you have a better idea.
As for the definition, the latter one seems to match my intention.
ExtremeValueList := {Expr(i_first, j_first), Expr(i_first, j_last), Expr(i_last, j_first), Expr(i_last, j_last)};
Range := [min(ExtremeValueList), max(ExtremeValueList)]
IsMonotonic(Expr) := Expr(i, j) is in Range for all i in [i_first, i_last] and j in [j_first, j_last]
(since I slacked off on studying, I don't really understand about topology...)
| /// Note that we don't check if the step recurrence can be zero. For | ||
| /// example,an AddRec `{0,+,%a}<nsw> is classifed as Monotonic if `%a` can be | ||
| /// zero. That is, the expression can be Invariant. | ||
| MultiSignedMonotonic, | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MultivariateMonotonic
For a mathematical (monotonic) function, invariants are just constants that do not appear in the function's domain.
| 
 In the loop below The monotonocity check fails This is not a correctness issue, and I am not sure if there will be any realistic performance issue caused by this or not. But I thought to bring it up. Talking about performance impact of monotonicity check, the example that I gave before is more concerning: loops like this are common in fortran programs. We probably need to look into this and if the missing flag is not a bug then the question is how we can safely check dependency for this? I don't think this issue prevents us from merging this patch, since this is currently the best solution that we know to address some correctness issues. But anyways, this should be investigated. I think we can do at least an initial investigation on our end. | 
| 
 Yeah, we are talking about the SCEV wrapping behavior here, not C/C++ (or other languages) specific behavior. As for the sign extension, I'm not entirely sure if it's problematic, but it's certain that there are some issues in DA with handling of sign/zero extensions. I think at least  
 The fundamental problem is a bit more serious. We need to pay attention to wrappings, even if a value doesn't iterate through the entire space. For instance,  
 Then the minimum value of  As for the naming, I believe I wrote the intention of the name  | 
| 
 It is one of the largest issue in the current implementation. A simpler case like below cannot be handled well, since  for (size_t i = 0; i < N; i++)
  A[i] = 0;
 I haven’t looked into the details of how nowrap flags are inferred in SCEV, but I guess SCEV fails to prove that the second store is executed unconditionally1. In fact, inserting a conditional branch before the first store (like this) also causes the nowrap flag to be dropped from the SCEV for the first store and the monotonicity check fails. Regardless of the cause, I think it should be handled on the DA side. In this specific case, all the variables involved, except for the induction variables, are constants, so I believe it shouldn’t be too difficult to prove that they don’t wrap. Just holding a possible minimum/maximum value while checking  Footnotes
 | 
| 
 Even for a much simpler IR we have the same problem: Source code Report: Command line: I don't know how the IR of the testcase in SameSDloop.ll is generated and why the first loop there is OK. I suspect if we start solving this kind of problem in DA, we will eventually reinvent many SCEV wheels in DA. | 
| It's strange to me that  
 By limiting the scope to cases where the BTC and the operands of the addrecs are constants, I believe it shouldn’t be very complex. If we want to make (potentially significant) changes to SCEV for the sake of DA to support more complex cases, then I think we should first get DA into the default pipeline and demonstrate its usefulness... | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On some 64-bit platforms, the higher bits of address are ignored. Sometimes it is even possible to use that to store additional data (see TBI feature in AArch64). I believe in x86 we have a similar situation but the top bits need to follow a canonical form. Doesn't this allow us to assume a correct program won't have overflow in address calculation? (EDIT: Same thing for subscript. They cannot be arbitrary 64 bit values. Effective bit-width should be smaller, and I suspect that may allow us to ignore any concern about AddRec value wrapping).
| /// The type of monotonicity of a SCEV. This property is defined with respect to | ||
| /// the outermost loop that DA is analyzing. | ||
| /// | ||
| /// This is designed to classify the behavior of AddRec expressions, and does | ||
| /// not care about other SCEVs. For example, given the two loop-invariant values | ||
| /// `A` and `B`, `A + B` is treated as Invariant even if the addition wraps. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using Monotonic is fine if the documentation/definition reflects what it is meant to be, even if it only implements it for AddRec atm. That would give a clear path how it cold to be extended.
This property is defined with respect to the outermost loop that DA is analyzing.
Could be understood as FlagNSW of the outermost loop only, but I think you mean wrapping of any nested loop.
This is designed to classify the behavior of AddRec expressions, and does not care about other SCEVs.
The current doxygen for SCEVMonotonicityType says it it only for AddRecs, and mixes monotonicity and wrapping (I think we consider monotonicity to imply no-wrapping, so if a wrapping AddRec is found it cannot be monotonic, but the other way around may not be true, e.g. with a non-affine SCEVAddRecExpr). Only caring about AddRecs seems arbitrary. Why? What is the property we want to ensure? Could you use a clearer definition?
For example, given the two loop-invariant values
AandB,A + Bis treated as Invariant even if the addition wraps.
I think this is easier to explain with monotonicity, which is always relative to a variates, the loop induction variables in this case. A and B are just constants (so A + B also evaluates to just a constant, even if it is the result of an overflow, and could have been hoisted out of the loop), i.e. the function over which we defined monotonicity is 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's clearly mixing up the definition and the implementation. The definition itself doesn't need to be limited to AddRecs. I'll rewrite the definition.
I think we consider monotonicity to imply no-wrapping, so if a wrapping AddRec is found it cannot be monotonic
Correct, that's what I tried to describe.
For example, given the two loop-invariant values
AandB,A + Bis treated as Invariant even if the addition wraps.I think this is easier to explain with monotonicity, which is always relative to a variates, the loop induction variables in this case.
AandBare just constants (soA + Balso evaluates to just a constant, even if it is the result of an overflow, and could have been hoisted out of the loop), i.e. the function over which we defined monotonicity is f A , B ( i ) , where i is the variate.
Do you mean that monotonicity is not defined for constants in the first place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revised the definition of monotonicity. I believe it's better than before...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean that monotonicity is not defined for constants in the first place?
Constants, in the sense of a parametric function, are not the subject of the monotonicity property, only the function argument/variate is. It is the question "when modifying x, how does c (
For loop nests, c?", or answer with the most pessimistic result for any Unkown, i.e. using a forall quantifier.  For the 
For DA, it just means within the scope of the anlysis, a and b are invariant/constant, a + b is just some value, no matter how it was computed. It could also be an unknown function g(a,b), as long as we know that the result of g only depends on its arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I think I got it. It seems that I was mixed parameters and arguments for a parametric function.
aceb708    to
    83ab4c6      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, please still consider other's remarks
| /// sense. Note that the multimonotonic function may also be a constant | ||
| /// function. The order employed in the definition of monotonicity is not | ||
| /// strict order. | ||
| MultivariateSignedMonotonic, | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Used "multimonotonic" in the description, but "Multivariate monotonic" here. Consistency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unified to "multivariate monotonic".
| /// The type of monotonicity of a SCEV. This property is defined with respect to | ||
| /// the outermost loop that DA is analyzing. | ||
| /// | ||
| /// This is designed to classify the behavior of AddRec expressions, and does | ||
| /// not care about other SCEVs. For example, given the two loop-invariant values | ||
| /// `A` and `B`, `A + B` is treated as Invariant even if the addition wraps. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean that monotonicity is not defined for constants in the first place?
Constants, in the sense of a parametric function, are not the subject of the monotonicity property, only the function argument/variate is. It is the question "when modifying x, how does c (
For loop nests, c?", or answer with the most pessimistic result for any Unkown, i.e. using a forall quantifier.  For the 
For DA, it just means within the scope of the anlysis, a and b are invariant/constant, a + b is just some value, no matter how it was computed. It could also be an unknown function g(a,b), as long as we know that the result of g only depends on its arguments.
| 
 My mistake. I think this makes the situation worse, not better. For a platform that ignores upper 8 bits of a 64-bit pointer, even in a simple loop like this Every time the lower 56 bits of  I don't have objections to merging this patch. This works well if effective pointer size is 32 or 64. But I think this issue needs to be discussed further, independent of this patch (for example, I suspect we can write test cases that are vectorized but they are not legal, and it is question for me how important is to fix all these unlikely corner cases -- EDIT: in particular could this kind of bug be a security issue?). | 
| /// F(i_1, ..., i_{k-1}, x, i_{k+1}, ..., i_N) <= | ||
| /// F(i_1, ..., i_{k-1}, y, i_{k+1}, ..., i_N) | ||
| /// | ||
| /// where i_1, ..., i_{k-1}, i_{k+1}, ..., i_N, x, y in their domains. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels like a verb is missing in this sentence, maybe something like:
| /// where i_1, ..., i_{k-1}, i_{k+1}, ..., i_N, x, y in their domains. | |
| /// where i_1, ..., i_{k-1}, i_{k+1}, ..., i_N, x, and y are elements of their | |
| /// respective domains. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, fixed.
| /// F(i_1, ..., i_{k-1}, x, i_{k+1}, ..., i_N) >= | ||
| /// F(i_1, ..., i_{k-1}, y, i_{k+1}, ..., i_N) | ||
| /// | ||
| /// A function F with either monotonically increasing or decreasing with | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| /// A function F with either monotonically increasing or decreasing with | |
| /// A function F that is monotonically increasing or decreasing with | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
Excuse: I mimicked this phrase from wikipedia.
A function with either property is called strictly monotone.
Probably I need to learn English harder...
| /// "monotonic with respect to k-th loop". | ||
| /// | ||
| /// A function F is said to be "multimonotonic" when it is monotonic with | ||
| /// respect to all of the N loops. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for my own understanding, but guess it could be generally beneficial, can you elaborate here why multimonotonic is important? E.g. why is is this required and for what?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is just a definition, I'd prefer to avoid adding such context here. Instead added brief description to SCEVMonotonicityChecker.
| /// sense. Note that the inequality "x <= y" merely indicates loop progression | ||
| /// and is not affected by the difference between signed and unsigned order. | ||
| /// | ||
| /// Currently we only consider monotonicity in a signed sense. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lot of things have been discussed in the comments. I need to catch up on some, but that is my problem. What I was going to ask is this. I really like the clear descriptions so far, but can we add, or is it worth explaining a little bit more, the algorithm that determines monotonicity? That will be a high level description of course, but I feel explaining the concepts of looking at AddRec and the nowrap flags etc. is missing a little bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some more comments.
| 
 I think you should definitely do that, and raise issues for them. That's is the first step to discussing this further, I think, if we can find such examples. 
 But this code is not portable, it is not language conforming? Thus, this example and its behaviour is tied to this particular platform, but that the behaviour isn't specified and could be anything. Reading passed the array is definitely a security problem, but I am not yet connecting the dots here with the SCEV behaviour. | 
| 
 I'm not sure if this answers your question, but you might want to take a look at LangRef for details on how address calculations are interpreted in LLVM. I'm also not entirely sure, but it might be true that we should pay a bit more attention to the pointer index type, which should be able to get from functions like  | 
| 
 We are not accessing any out of bound memory in the above example. I am not sure whether this violates any language standard or not. I will look into it. The fundamental issue is that two different 64 bit addresses may point to the same memory location at least on AArch64.  On x86 the situation maybe different. Since the upper bits of address have to be in a canonical form. I will do some more investigation on x86 as well, and then open an issue to discuss the implications. Regarding the bug, haven't checked vectorization yet, but I have another bug. Basically alias analysis thinks  Compile with O0 and O3. The results are different.  | 
| 
 Thanks for the example @amehsan . | 
| You are conflating the physical machine behavior with the abstract machine model. As far as LLVM is concerned, doing an access at address  Using TBI in a way that is compatible with LLVM's memory model is actually quite tricky (*), but I don't think this is really relevant to what this PR is doing. (*) One way to model this is to destroy the old provenance and allocate a fresh one whenever the TBI bits change. This effectively means that from an AM perspective, the object only ever lives at a single location at any given time. | 
| 
 You are right. I missed that. Adding provenance to the picture I suspect there might be a more permissive algorithm for monotonicity. I need to do some more work on it. Will post an update by Monday. | 
| I explain the main idea with an example here. If this is correct and can be made to work, it will be indepndent of this patch and probably something that can be implemented earlier in pipeline. Consider the following loop. Also this part of langref about  
 The value of  
 Essentially we prove that  (EDIT: Unless I hear something that undermines the whole idea for some reason, we will follow up on this separately like an indepndent feature. RFC, etc.) (*) AArch64 and x86 machines use less than 64 bits for  virtual address as discussed before. A machine that uses e.g. 56 bits for addressing cannot allocate an object of size larger than  (**) I am not sure DataLayout is the best place. The bound could be  | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as mentioned above, I think this patch should be merged independent of the idea that I explained.
| @amehsan The size of allocated objects is already limited to at most half the address space, see https://llvm.org/docs/LangRef.html#allocated-objects. The kind of reasoning you do is indeed valid (*), and something that LAA already does, see: llvm-project/llvm/lib/Analysis/LoopAccessAnalysis.cpp Lines 1026 to 1033 in ef87da0 
 However, this reasoning critically depends on the assumption that the GEP feeds into a memory access that will be executed. The way this is handled in LAA is not entirely correct, see the fix in #161445. We can't use this reasoning in conditionally executed code. (*) It may not be valid in multi-provenance models -- but that's mostly an argument against having a multi-provenance model... | 
| 
 OK, I am not sure if pushing down this upper bound could help or not. 
 Correct. 
 Two comments on this: From a very quick glance, I suspect there might some bugs in this code. If the offset of GEP is a function  I believe this would be more useful somewhere else, not in LAA or DA. For exmaple if this reasoning can infer nsw/nuw flags for induction variable of a loop, that might have wider benefit than implementing it in LAA or DA. Another candidate could be in SCEV. | 
| 
 After working on a few examples, I think I need to look more closely into the existing LAA code. Thanks for the information. | 
| Since there don't seem to be any objections, I will go ahead and land the changes. Thanks for reviews! | 
| @amehsan JFYI: This commit message really helped me understand the nusw-based inference in LAA. | 
| 
 Thanks. I am wondering about cases like this: Where your offset spans a much smaller interval (So we cannot rely on UB from object size). Somehow this is taken care of in vectorization. I believe that happens before we reach this part of LAA code and so here, we don't need to worry about it. For this case, we check and if n is larger than a threshold (which depends on the size of  | 
| 
 Seems correct. In my understanding, 
 I guess  | 
| 
 Yes, it is probably provably correct for affine addrec and affine addrec should cover most practically interesting cases. We'll look into implementing this logic earlier in the pipeline so it can help DA and potentially other passes too. | 
* [flang] Fix standalone build regression from llvm#161179 (llvm#164309) Fix incorrect linking and dependencies introduced in llvm#161179 that break standalone builds of Flang. Signed-off-by: Michał Górny <[email protected]> * [AMDGPU] Remove magic constants from V_PK_ADD_F32 pattern. NFC (llvm#164335) * [AMDGPU] Update code sequence for CU-mode Release Fences in GFX10+ (llvm#161638) They were previously optimized to not emit any waitcnt, which is technically correct because there is no reordering of operations at workgroup scope in CU mode for GFX10+. This breaks transitivity however, for example if we have the following sequence of events in one thread: - some stores - store atomic release syncscope("workgroup") - barrier then another thread follows with - barrier - load atomic acquire - store atomic release syncscope("agent") It does not work because, while the other thread sees the stores, it cannot release them at the wider scope. Our release fences aren't strong enough to "wait" on stores from other waves. We also cannot strengthen our release fences any further to allow for releasing other wave's stores because only GFX12 can do that with `global_wb`. GFX10-11 do not have the writeback instruction. It'd also add yet another level of complexity to code sequences, with both acquire/release having CU-mode only alternatives. Lastly, acq/rel are always used together. The price for synchronization has to be paid either at the acq, or the rel. Strengthening the releases would just make the memory model more complex but wouldn't help performance. So the choice here is to streamline the code sequences by making CU and WGP mode emit almost identical (vL0 inv is not needed in CU mode) code for release (or stronger) atomic ordering. This also removes the `vm_vsrc(0)` wait before barriers. Now that the release fence in CU mode is strong enough, it is no longer needed. Supersedes llvm#160501 Solves SC1-6454 * [InstSimplify] Support ptrtoaddr in simplifyGEPInst() (llvm#164262) This adds support for ptrtoaddr in the `ptradd p, ptrtoaddr(p2) - ptrtoaddr(p) -> p2` fold. This fold requires that p and p2 have the same underlying object (otherwise the provenance may not be the same). The argument I would like to make here is that because the underlying objects are the same (and the pointers in the same address space), the non-address bits of the pointer must be the same. Looking at some specific cases of underlying object relationship: * phi/select: Trivially true. * getelementptr: Only modifies address bits, non-address bits must remain the same. * addrspacecast round-trip cast: Must preserve all bits because we optimize such round-trip casts away. * non-interposable global alias: I'm a bit unsure about this one, but I guess the alias and the aliasee must have the same non-address bits? * various intrinsics like launder.invariant.group, ptrmask. I think these all either preserve all pointer bits (like the invariant.group ones) or at least the non-address bits (like ptrmask). There are some interesting cases like amdgcn.make.buffer.rsrc, but those are cross address-space. ----- There is a second `gep (gep p, C), (sub 0, ptrtoint(p)) -> C` transform in this function, which I am not extending to handle ptrtoaddr, adding negative tests instead. This transform is overall dubious for provenance reasons, but especially dubious with ptrtoaddr, as then we don't have the guarantee that provenance of `p` has been exposed. * [Hexagon] Add REQUIRES: asserts to test This test uses -debug-only, so needs an assertion-enabled build. * [AArch64] Combing scalar_to_reg into DUP if the DUP already exists (llvm#160499) If we already have a dup(x) as part of the DAG along with a scalar_to_vec(x), we can re-use the result of the dup to the scalar_to_vec(x). * [CAS] OnDiskGraphDB - fix MSVC "not all control paths return a value" warnings. NFC. (llvm#164369) * Reapply "[libc++] Optimize __hash_table::erase(iterator, iterator)" (llvm#162850) This reapplication fixes the use after free caused by not properly updating the bucket list in one case. Original commit message: Instead of just calling the single element `erase` on every element of the range, we can combine some of the operations in a custom implementation. Specifically, we don't need to search for the previous node or re-link the list every iteration. Removing this unnecessary work results in some nice performance improvements: ``` ----------------------------------------------------------------------------------------------------------------------- Benchmark old new ----------------------------------------------------------------------------------------------------------------------- std::unordered_set<int>::erase(iterator, iterator) (erase half the container)/0 457 ns 459 ns std::unordered_set<int>::erase(iterator, iterator) (erase half the container)/32 995 ns 626 ns std::unordered_set<int>::erase(iterator, iterator) (erase half the container)/1024 18196 ns 7995 ns std::unordered_set<int>::erase(iterator, iterator) (erase half the container)/8192 124722 ns 70125 ns std::unordered_set<std::string>::erase(iterator, iterator) (erase half the container)/0 456 ns 461 ns std::unordered_set<std::string>::erase(iterator, iterator) (erase half the container)/32 1183 ns 769 ns std::unordered_set<std::string>::erase(iterator, iterator) (erase half the container)/1024 27827 ns 18614 ns std::unordered_set<std::string>::erase(iterator, iterator) (erase half the container)/8192 266681 ns 226107 ns std::unordered_map<int, int>::erase(iterator, iterator) (erase half the container)/0 455 ns 462 ns std::unordered_map<int, int>::erase(iterator, iterator) (erase half the container)/32 996 ns 659 ns std::unordered_map<int, int>::erase(iterator, iterator) (erase half the container)/1024 15963 ns 8108 ns std::unordered_map<int, int>::erase(iterator, iterator) (erase half the container)/8192 136493 ns 71848 ns std::unordered_multiset<int>::erase(iterator, iterator) (erase half the container)/0 454 ns 455 ns std::unordered_multiset<int>::erase(iterator, iterator) (erase half the container)/32 985 ns 703 ns std::unordered_multiset<int>::erase(iterator, iterator) (erase half the container)/1024 16277 ns 9085 ns std::unordered_multiset<int>::erase(iterator, iterator) (erase half the container)/8192 125736 ns 82710 ns std::unordered_multimap<int, int>::erase(iterator, iterator) (erase half the container)/0 457 ns 454 ns std::unordered_multimap<int, int>::erase(iterator, iterator) (erase half the container)/32 1091 ns 646 ns std::unordered_multimap<int, int>::erase(iterator, iterator) (erase half the container)/1024 17784 ns 7664 ns std::unordered_multimap<int, int>::erase(iterator, iterator) (erase half the container)/8192 127098 ns 72806 ns ``` This reverts commit acc3a62. * [TableGen] List the indices of sub-operands (llvm#163723) Some instances of the `Operand` class used in Tablegen instruction definitions expand to a cluster of multiple operands at the MC layer, such as complex addressing modes involving base + offset + shift, or clusters of operands describing conditional Arm instructions or predicated MVE instructions. There's currently no convenient way for C++ code to know the offset of one of those sub-operands from the start of the cluster: instead it just hard-codes magic numbers like `index+2`, which is hard to read and fragile. This patch adds an extra piece of output to `InstrInfoEmitter` to define those instruction offsets, based on the name of the `Operand` class instance in Tablegen, and the names assigned to the sub-operands in the `MIOperandInfo` field. For example, if target Foo were to define def Bar : Operand { let MIOperandInfo = (ops GPR:$first, i32imm:$second); // ... } then the new constants would be `Foo::SUBOP_Bar_first` and `Foo::SUBOP_Bar_second`, defined as 0 and 1 respectively. As an example, I've converted some magic numbers related to the MVE predication operand types (`vpred_n` and its superset `vpred_r`) to use the new named constants in place of the integer literals they previously used. This is more verbose, but also clearer, because it explains why the integer is chosen instead of what its value is. * [lldb] Add bidirectional packetLog to gdbclientutils.py (llvm#162176) While debugging the tests for llvm#155000 I found it helpful to have both sides of the simulated gdb-rsp traffic rather than just the responses so I've extended the packetLog in MockGDBServerResponder to record traffic in both directions. Tests have been updated accordingly * [MLIR] [Vector] Added canonicalizer for folding from_elements + transpose (llvm#161841) ## Description Adds a new canonicalizer that folds `vector.from_elements(vector.transpose))` => `vector.from_elements`. This canonicalization reorders the input elements for `vector.from_elements`, adjusts the output shape to match the effect of the transpose op and eliminating its need. ## Testing Added a 2D vector lit test that verifies the working of the rewrite. --------- Signed-off-by: Keshav Vinayak Jha <[email protected]> * [DA] Add initial support for monotonicity check (llvm#162280) The dependence testing functions in DA assume that the analyzed AddRec does not wrap over the entire iteration space. For AddRecs that may wrap, DA should conservatively return unknown dependence. However, no validation is currently performed to ensure that this condition holds, which can lead to incorrect results in some cases. This patch introduces the notion of *monotonicity* and a validation logic to check whether a SCEV is monotonic. The monotonicity check classifies the SCEV into one of the following categories: - Unknown: Nothing is known about the monotonicity of the SCEV. - Invariant: The SCEV is loop-invariant. - MultivariateSignedMonotonic: The SCEV doesn't wrap in a signed sense for any iteration of the loops in the loop nest. The current validation logic basically searches an affine AddRec recursively and checks whether the `nsw` flag is present. Notably, it is still unclear whether we should also have a category for unsigned monotonicity. The monotonicity check is still under development and disabled by default for now. Since such a check is necessary to make DA sound, it should be enabled by default once the functionality is sufficient. Split off from llvm#154527. * [VPlan] Use VPlan::getRegion to shorten code (NFC) (llvm#164287) * [VPlan] Improve code using m_APInt (NFC) (llvm#161683) * [SystemZ] Avoid trunc(add(X,X)) patterns (llvm#164378) Replace with trunc(add(X,Y)) to avoid premature folding in upcoming patch llvm#164227 * [clang][CodeGen] Emit `llvm.tbaa.errno` metadata during module creation Let Clang emit `llvm.tbaa.errno` metadata in order to let LLVM carry out optimizations around errno-writing libcalls to, as long as it is proved the involved memory location does not alias `errno`. Previous discussion: https://discourse.llvm.org/t/rfc-modelling-errno-memory-effects/82972. * [LV][NFC] Remove undef from phi incoming values (llvm#163762) Split off from PR llvm#163525, this standalone patch replaces use of undef as incoming PHI values with zero, in order to reduce the likelihood of contributors hitting the `undef deprecator` warning in github. * [DA] Add option to enable specific dependence test only (llvm#164245) PR llvm#157084 added an option `da-run-siv-routines-only` to run only SIV routines in DA. This PR replaces that option with a more fine-grained one that allows to select other than SIV routines as well. This option is useful for regression testing of individual DA routines. This patch also reorganizes regression tests that use `da-run-siv-routines-only`. * [libcxx] Optimize `std::generate_n` for segmented iterators (llvm#164266) Part of llvm#102817. This is a natural follow-up to llvm#163006. We are forwarding `std::generate_n` to `std::__for_each_n` (`std::for_each_n` needs c++17), resulting in improved performance for segmented iterators. before: ``` std::generate_n(deque<int>)/32 17.5 ns 17.3 ns 40727273 std::generate_n(deque<int>)/50 25.7 ns 25.5 ns 26352941 std::generate_n(deque<int>)/1024 490 ns 487 ns 1445161 std::generate_n(deque<int>)/8192 3908 ns 3924 ns 179200 ``` after: ``` std::generate_n(deque<int>)/32 11.1 ns 11.0 ns 64000000 std::generate_n(deque<int>)/50 16.1 ns 16.0 ns 44800000 std::generate_n(deque<int>)/1024 291 ns 292 ns 2357895 std::generate_n(deque<int>)/8192 2269 ns 2250 ns 298667 ``` * [BOLT] Check entry point address is not in constant island (llvm#163418) There are cases where `addEntryPointAtOffset` is called with a given `Offset` that points to an address within a constant island. This triggers `assert(!isInConstantIsland(EntryPointAddress)` and causes BOLT to crash. This patch adds a check which ignores functions that would add such entry points and warns the user. * [llvm][dwarfdump] Pretty-print DW_AT_language_version (llvm#164222) In both verbose and non-verbose mode we will now use the `llvm::dwarf::LanguageDescription` to turn the version into a human readable string. In verbose mode we also display the raw version code (similar to how we display addresses in verbose mode). To make the version code and prettified easier to distinguish, we print the prettified name in colour (if available), which is consistent with how `DW_AT_language` is printed in colour. Before: ``` 0x0000000c: DW_TAG_compile_unit DW_AT_language_name (DW_LNAME_C) DW_AT_language_version (201112) ``` After: ``` 0x0000000c: DW_TAG_compile_unit DW_AT_language_name (DW_LNAME_C) DW_AT_language_version (201112 C11) ``` --------- Signed-off-by: Michał Górny <[email protected]> Signed-off-by: Keshav Vinayak Jha <[email protected]> Co-authored-by: Michał Górny <[email protected]> Co-authored-by: Stanislav Mekhanoshin <[email protected]> Co-authored-by: Pierre van Houtryve <[email protected]> Co-authored-by: Nikita Popov <[email protected]> Co-authored-by: David Green <[email protected]> Co-authored-by: Simon Pilgrim <[email protected]> Co-authored-by: Nikolas Klauser <[email protected]> Co-authored-by: Simon Tatham <[email protected]> Co-authored-by: Daniel Sanders <[email protected]> Co-authored-by: Keshav Vinayak Jha <[email protected]> Co-authored-by: Ryotaro Kasuga <[email protected]> Co-authored-by: Ramkumar Ramachandra <[email protected]> Co-authored-by: Antonio Frighetto <[email protected]> Co-authored-by: David Sherwood <[email protected]> Co-authored-by: Connector Switch <[email protected]> Co-authored-by: Asher Dobrescu <[email protected]> Co-authored-by: Michael Buch <[email protected]>
The dependence testing functions in DA assume that the analyzed AddRec does not wrap over the entire iteration space. For AddRecs that may wrap, DA should conservatively return unknown dependence. However, no validation is currently performed to ensure that this condition holds, which can lead to incorrect results in some cases. This patch introduces the notion of *monotonicity* and a validation logic to check whether a SCEV is monotonic. The monotonicity check classifies the SCEV into one of the following categories: - Unknown: Nothing is known about the monotonicity of the SCEV. - Invariant: The SCEV is loop-invariant. - MultivariateSignedMonotonic: The SCEV doesn't wrap in a signed sense for any iteration of the loops in the loop nest. The current validation logic basically searches an affine AddRec recursively and checks whether the `nsw` flag is present. Notably, it is still unclear whether we should also have a category for unsigned monotonicity. The monotonicity check is still under development and disabled by default for now. Since such a check is necessary to make DA sound, it should be enabled by default once the functionality is sufficient. Split off from llvm#154527.
The dependence testing functions in DA assume that the analyzed AddRec does not wrap over the entire iteration space. For AddRecs that may wrap, DA should conservatively return unknown dependence. However, no validation is currently performed to ensure that this condition holds, which can lead to incorrect results in some cases. This patch introduces the notion of *monotonicity* and a validation logic to check whether a SCEV is monotonic. The monotonicity check classifies the SCEV into one of the following categories: - Unknown: Nothing is known about the monotonicity of the SCEV. - Invariant: The SCEV is loop-invariant. - MultivariateSignedMonotonic: The SCEV doesn't wrap in a signed sense for any iteration of the loops in the loop nest. The current validation logic basically searches an affine AddRec recursively and checks whether the `nsw` flag is present. Notably, it is still unclear whether we should also have a category for unsigned monotonicity. The monotonicity check is still under development and disabled by default for now. Since such a check is necessary to make DA sound, it should be enabled by default once the functionality is sufficient. Split off from llvm#154527.

The dependence testing functions in DA assume that the analyzed AddRec does not wrap over the entire iteration space. For AddRecs that may wrap, DA should conservatively return unknown dependence. However, no validation is currently performed to ensure that this condition holds, which can lead to incorrect results in some cases.
This patch introduces the notion of monotonicity and a validation logic to check whether a SCEV is monotonic. The monotonicity check classifies the SCEV into one of the following categories:
The current validation logic basically searches an affine AddRec recursively and checks whether the
nswflag is present. Notably, it is still unclear whether we should also have a category for unsigned monotonicity.The monotonicity check is still under development and disabled by default for now. Since such a check is necessary to make DA sound, it should be enabled by default once the functionality is sufficient.
Split off from #154527.