Skip to content

Conversation

@kasuga-fj
Copy link
Contributor

@kasuga-fj kasuga-fj commented Oct 7, 2025

The dependence testing functions in DA assume that the analyzed AddRec does not wrap over the entire iteration space. For AddRecs that may wrap, DA should conservatively return unknown dependence. However, no validation is currently performed to ensure that this condition holds, which can lead to incorrect results in some cases.

This patch introduces the notion of monotonicity and a validation logic to check whether a SCEV is monotonic. The monotonicity check classifies the SCEV into one of the following categories:

  • Unknown: Nothing is known about the monotonicity of the SCEV.
  • Invariant: The SCEV is loop-invariant.
  • MultivariateSignedMonotonic: The SCEV doesn't wrap in a signed sense for any iteration of the loops in the loop nest.

The current validation logic basically searches an affine AddRec recursively and checks whether the nsw flag is present. Notably, it is still unclear whether we should also have a category for unsigned monotonicity.
The monotonicity check is still under development and disabled by default for now. Since such a check is necessary to make DA sound, it should be enabled by default once the functionality is sufficient.

Split off from #154527.

Copy link
Contributor Author

kasuga-fj commented Oct 7, 2025

@kasuga-fj kasuga-fj force-pushed the users/kasuga-fj/da-monotonic-check-0 branch from 611229f to 9bfa9d5 Compare October 9, 2025 10:51
/// Note that we don't check if the step recurrence can be zero. For
/// example,an AddRec `{0,+,%a}<nsw> is classifed as Monotonic if `%a` can be
/// zero. That is, the expression can be Invariant.
MultiSignedMonotonic,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this name is not good. Please let me know if you have a better one. (it would be better if the name also imply that the step value is loop invariant.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MultivariateMonotonic

For a mathematical (monotonic) function, invariants are just constants that do not appear in the function's domain.

Comment on lines +261 to +262
struct SCEVMonotonicityChecker
: public SCEVVisitor<SCEVMonotonicityChecker, SCEVMonotonicity> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for the testability, maybe is it better to split the file, like ScalarEvolutionDivision.cpp? Or would it be better to avoid creating separate files unnecessarily?

; if (cond)
; a[i + j] = 0;
;
define void @conditional_store0(ptr %a, i64 %n, i64 %m) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to #159846 (comment), apparently we cannot infer nsw in this case. I still need to check practical cases, it might be serious and the analysis might be too poor...

Copy link
Contributor

@amehsan amehsan Oct 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just looking at the pseudo C and your comment above: Something like this might be a test case for SCEV, but not for DA. In DA I believe you should just close your eyes and assume any nsw/nuw flag given to you is correct.

(EDIT: of course it is good to have a testcase, that checks when nsw/nuw doesn't exist, we do not prove monotonicity)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As your EDIT says, this is a case for SCEV cannot infer nsw so we cannot prove monotonicity, not one where we don't trust the given nsw flag. The purpose of this test is to share the limitations of the analysis capabilities.

@@ -0,0 +1,459 @@
; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py UTC_ARGS: --version 6
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, I've added all the cases I could think of. Please let me know if there's anything else you're interested in.

@kasuga-fj kasuga-fj marked this pull request as ready for review October 9, 2025 11:12
@llvmbot llvmbot added the llvm:analysis Includes value tracking, cost tables and constant folding label Oct 9, 2025
@llvmbot
Copy link
Member

llvmbot commented Oct 9, 2025

@llvm/pr-subscribers-llvm-analysis

Author: Ryotaro Kasuga (kasuga-fj)

Changes

The dependence testing functions in DA assume that the analyzed AddRec does not wrap over the entire iteration space. This means that DA cannot analyze AddRecs that may wrap, and should conservatively return Unknown dependence for such cases. However, no validation is currently performed to ensure that this condition holds, which can lead to incorrect results in some cases.

This patch introduces the notion of monotonicity and a validation logic to check whether an AddRec is monotonic. The monotonicity check classifies the subscript of a memory access into one of the following categories:

  • Unknown: Nothing is known about the monotonicity of the subscript.
  • Invariant: The subscript is loop invariant.
  • MultiSignedMonotonic: The subscript is an AddRec, and it does not wrap in a signed sense for any iteration of the loops in the loop nest.

The current validation logic basically searches an AddRec recursively and checks whether the nsw flag is present. Notably, it is still unclear whether we should also have a category for unsigned monotonicity. The monotonicity check is still under development and disabled by default for now. Since such a check is necessary to make DA sound, it should be enabled by default once the functionality is sufficient.

Split off from #154527.


Patch is 38.32 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/162280.diff

4 Files Affected:

  • (modified) llvm/lib/Analysis/DependenceAnalysis.cpp (+273-3)
  • (added) llvm/test/Analysis/DependenceAnalysis/monotonicity-cast.ll (+174)
  • (added) llvm/test/Analysis/DependenceAnalysis/monotonicity-invariant.ll (+150)
  • (added) llvm/test/Analysis/DependenceAnalysis/monotonicity-no-wrap-flags.ll (+459)
diff --git a/llvm/lib/Analysis/DependenceAnalysis.cpp b/llvm/lib/Analysis/DependenceAnalysis.cpp
index 1f0da8d1830d3..a3134f8571481 100644
--- a/llvm/lib/Analysis/DependenceAnalysis.cpp
+++ b/llvm/lib/Analysis/DependenceAnalysis.cpp
@@ -128,6 +128,18 @@ static cl::opt<bool> RunSIVRoutinesOnly(
              "The purpose is mainly to exclude the influence of those routines "
              "in regression tests for SIV routines."));
 
+// TODO: This flag is disabled by default because it is still under development.
+// Enable it or delete this flag when the feature is ready.
+static cl::opt<bool> EnableMonotonicityCheck(
+    "da-enable-monotonicity-check", cl::init(false), cl::Hidden,
+    cl::desc("Check if the subscripts are monotonic. If it's not, dependence "
+             "is reported as unknown."));
+
+static cl::opt<bool> DumpMonotonicityReport(
+    "da-dump-monotonicity-report", cl::init(false), cl::Hidden,
+    cl::desc(
+        "When printing analysis, dump the results of monotonicity checks."));
+
 //===----------------------------------------------------------------------===//
 // basics
 
@@ -177,13 +189,189 @@ void DependenceAnalysisWrapperPass::getAnalysisUsage(AnalysisUsage &AU) const {
   AU.addRequiredTransitive<LoopInfoWrapperPass>();
 }
 
+namespace {
+
+/// The type of monotonicity of a SCEV. This property is defined with respect to
+/// the outermost loop that DA is analyzing.
+///
+/// This is designed to classify the behavior of AddRec expressions, and does
+/// not care about other SCEVs. For example, given the two loop-invariant values
+/// `A` and `B`, `A + B` is treated as Invariant even if the addition wraps.
+enum class SCEVMonotonicityType {
+  /// The expression is neither loop-invariant nor monotonic (or we fail to
+  /// prove it).
+  Unknown,
+
+  /// The expression is loop-invariant with respect to the outermost loop.
+  Invariant,
+
+  /// The expression is a (nested) affine AddRec and is monotonically increasing
+  /// or decreasing in a signed sense with respect to each loop. Monotonicity is
+  /// checked independently for each loop, and the expression is classified as
+  /// MultiSignedMonotonic if all AddRecs are nsw. For example, in the following
+  /// loop:
+  ///
+  ///   for (i = 0; i < 100; i++)
+  ///     for (j = 0; j < 100; j++)
+  ///       A[i + j] = ...;
+  ///
+  /// The SCEV for `i + j` is classified as MultiSignedMonotonic. On the other
+  /// hand, in the following loop:
+  ///
+  ///   for (i = 0; i < 100; i++)
+  ///     for (j = 0; j <= (1ULL << 63); j++)
+  ///       A[i + j] = ...;
+  ///
+  /// The SCEV for `i + j` is NOT classified as MultiMonotonic, because the
+  /// AddRec for `j` wraps in a signed sense. We don't consider the "direction"
+  /// of each AddRec. For example, in the following loop:
+  ///
+  ///  for (int i = 0; i < 100; i++)
+  ///    for (int j = 0; j < 100; j++)
+  ///      A[i - j] = ...;
+  ///
+  /// The SCEV for `i - j` is classified as MultiSignedMonotonic, even though it
+  /// contains both increasing and decreasing AddRecs.
+  ///
+  /// Note that we don't check if the step recurrence can be zero. For
+  /// example,an AddRec `{0,+,%a}<nsw> is classifed as Monotonic if `%a` can be
+  /// zero. That is, the expression can be Invariant.
+  MultiSignedMonotonic,
+};
+
+struct SCEVMonotonicity {
+  SCEVMonotonicity(SCEVMonotonicityType Type,
+                   const SCEV *FailurePoint = nullptr);
+
+  SCEVMonotonicityType getType() const { return Type; }
+
+  const SCEV *getFailurePoint() const { return FailurePoint; }
+
+  bool isUnknown() const { return Type == SCEVMonotonicityType::Unknown; }
+
+  void print(raw_ostream &OS, unsigned Depth) const;
+
+private:
+  SCEVMonotonicityType Type;
+
+  /// The subexpression that caused Unknown. Mainly for debugging purpose.
+  const SCEV *FailurePoint;
+};
+
+struct SCEVMonotonicityChecker
+    : public SCEVVisitor<SCEVMonotonicityChecker, SCEVMonotonicity> {
+
+  SCEVMonotonicityChecker(ScalarEvolution *SE) : SE(SE) {}
+
+  /// Check the monotonicity of \p Expr. \p Expr must be integer type. If \p
+  /// OutermostLoop is not null, \p Expr must be defined in \p OutermostLoop or
+  /// one of its nested loops.
+  SCEVMonotonicity checkMonotonicity(const SCEV *Expr,
+                                     const Loop *OutermostLoop);
+
+private:
+  ScalarEvolution *SE;
+
+  /// The outermost loop that DA is analyzing.
+  const Loop *OutermostLoop;
+
+  /// A helper to classify \p Expr as either Invariant or Unknown.
+  SCEVMonotonicity invariantOrUnknown(const SCEV *Expr);
+
+  /// Return true if \p Expr is loop-invariant with respect to the outermost
+  /// loop.
+  bool isLoopInvariant(const SCEV *Expr) const;
+
+  /// A helper to create an Unknown SCEVMonotonicity.
+  SCEVMonotonicity createUnknown(const SCEV *FailurePoint) {
+    return SCEVMonotonicity(SCEVMonotonicityType::Unknown, FailurePoint);
+  }
+
+  SCEVMonotonicity visitAddRecExpr(const SCEVAddRecExpr *Expr);
+
+  SCEVMonotonicity visitConstant(const SCEVConstant *) {
+    return SCEVMonotonicity(SCEVMonotonicityType::Invariant);
+  }
+  SCEVMonotonicity visitVScale(const SCEVVScale *) {
+    return SCEVMonotonicity(SCEVMonotonicityType::Invariant);
+  }
+
+  // TODO: Handle more cases.
+  SCEVMonotonicity visitZeroExtendExpr(const SCEVZeroExtendExpr *Expr) {
+    return invariantOrUnknown(Expr);
+  }
+  SCEVMonotonicity visitSignExtendExpr(const SCEVSignExtendExpr *Expr) {
+    return invariantOrUnknown(Expr);
+  }
+  SCEVMonotonicity visitAddExpr(const SCEVAddExpr *Expr) {
+    return invariantOrUnknown(Expr);
+  }
+  SCEVMonotonicity visitMulExpr(const SCEVMulExpr *Expr) {
+    return invariantOrUnknown(Expr);
+  }
+  SCEVMonotonicity visitPtrToIntExpr(const SCEVPtrToIntExpr *Expr) {
+    return invariantOrUnknown(Expr);
+  }
+  SCEVMonotonicity visitTruncateExpr(const SCEVTruncateExpr *Expr) {
+    return invariantOrUnknown(Expr);
+  }
+  SCEVMonotonicity visitUDivExpr(const SCEVUDivExpr *Expr) {
+    return invariantOrUnknown(Expr);
+  }
+  SCEVMonotonicity visitSMaxExpr(const SCEVSMaxExpr *Expr) {
+    return invariantOrUnknown(Expr);
+  }
+  SCEVMonotonicity visitUMaxExpr(const SCEVUMaxExpr *Expr) {
+    return invariantOrUnknown(Expr);
+  }
+  SCEVMonotonicity visitSMinExpr(const SCEVSMinExpr *Expr) {
+    return invariantOrUnknown(Expr);
+  }
+  SCEVMonotonicity visitUMinExpr(const SCEVUMinExpr *Expr) {
+    return invariantOrUnknown(Expr);
+  }
+  SCEVMonotonicity visitSequentialUMinExpr(const SCEVSequentialUMinExpr *Expr) {
+    return invariantOrUnknown(Expr);
+  }
+  SCEVMonotonicity visitUnknown(const SCEVUnknown *Expr) {
+    return invariantOrUnknown(Expr);
+  }
+  SCEVMonotonicity visitCouldNotCompute(const SCEVCouldNotCompute *Expr) {
+    return invariantOrUnknown(Expr);
+  }
+
+  friend struct SCEVVisitor<SCEVMonotonicityChecker, SCEVMonotonicity>;
+};
+
+} // anonymous namespace
+
 // Used to test the dependence analyzer.
 // Looks through the function, noting instructions that may access memory.
 // Calls depends() on every possible pair and prints out the result.
 // Ignores all other instructions.
 static void dumpExampleDependence(raw_ostream &OS, DependenceInfo *DA,
-                                  ScalarEvolution &SE, bool NormalizeResults) {
+                                  ScalarEvolution &SE, LoopInfo &LI,
+                                  bool NormalizeResults) {
   auto *F = DA->getFunction();
+
+  if (DumpMonotonicityReport) {
+    SCEVMonotonicityChecker Checker(&SE);
+    OS << "Monotonicity check:\n";
+    for (Instruction &Inst : instructions(F)) {
+      if (!isa<LoadInst>(Inst) && !isa<StoreInst>(Inst))
+        continue;
+      Value *Ptr = getLoadStorePointerOperand(&Inst);
+      const Loop *L = LI.getLoopFor(Inst.getParent());
+      const SCEV *PtrSCEV = SE.getSCEVAtScope(Ptr, L);
+      const SCEV *AccessFn = SE.removePointerBase(PtrSCEV);
+      SCEVMonotonicity Mon = Checker.checkMonotonicity(AccessFn, L);
+      OS.indent(2) << "Inst: " << Inst << "\n";
+      OS.indent(4) << "Expr: " << *AccessFn << "\n";
+      Mon.print(OS, 4);
+    }
+    OS << "\n";
+  }
+
   for (inst_iterator SrcI = inst_begin(F), SrcE = inst_end(F); SrcI != SrcE;
        ++SrcI) {
     if (SrcI->mayReadOrWriteMemory()) {
@@ -235,7 +423,8 @@ static void dumpExampleDependence(raw_ostream &OS, DependenceInfo *DA,
 void DependenceAnalysisWrapperPass::print(raw_ostream &OS,
                                           const Module *) const {
   dumpExampleDependence(
-      OS, info.get(), getAnalysis<ScalarEvolutionWrapperPass>().getSE(), false);
+      OS, info.get(), getAnalysis<ScalarEvolutionWrapperPass>().getSE(),
+      getAnalysis<LoopInfoWrapperPass>().getLoopInfo(), false);
 }
 
 PreservedAnalyses
@@ -244,7 +433,7 @@ DependenceAnalysisPrinterPass::run(Function &F, FunctionAnalysisManager &FAM) {
      << "':\n";
   dumpExampleDependence(OS, &FAM.getResult<DependenceAnalysis>(F),
                         FAM.getResult<ScalarEvolutionAnalysis>(F),
-                        NormalizeResults);
+                        FAM.getResult<LoopAnalysis>(F), NormalizeResults);
   return PreservedAnalyses::all();
 }
 
@@ -670,6 +859,70 @@ bool DependenceInfo::intersectConstraints(Constraint *X, const Constraint *Y) {
   return false;
 }
 
+//===----------------------------------------------------------------------===//
+// SCEVMonotonicity
+
+SCEVMonotonicity::SCEVMonotonicity(SCEVMonotonicityType Type,
+                                   const SCEV *FailurePoint)
+    : Type(Type), FailurePoint(FailurePoint) {
+  assert(
+      ((Type == SCEVMonotonicityType::Unknown) == (FailurePoint != nullptr)) &&
+      "FailurePoint must be provided iff Type is Unknown");
+}
+
+void SCEVMonotonicity::print(raw_ostream &OS, unsigned Depth) const {
+  OS.indent(Depth) << "Monotonicity: ";
+  switch (Type) {
+  case SCEVMonotonicityType::Unknown:
+    assert(FailurePoint && "FailurePoint must be provided for Unknown");
+    OS << "Unknown\n";
+    OS.indent(Depth) << "Reason: " << *FailurePoint << "\n";
+    break;
+  case SCEVMonotonicityType::Invariant:
+    OS << "Invariant\n";
+    break;
+  case SCEVMonotonicityType::MultiSignedMonotonic:
+    OS << "MultiSignedMonotonic\n";
+    break;
+  }
+}
+
+bool SCEVMonotonicityChecker::isLoopInvariant(const SCEV *Expr) const {
+  return !OutermostLoop || SE->isLoopInvariant(Expr, OutermostLoop);
+}
+
+SCEVMonotonicity SCEVMonotonicityChecker::invariantOrUnknown(const SCEV *Expr) {
+  if (isLoopInvariant(Expr))
+    return SCEVMonotonicity(SCEVMonotonicityType::Invariant);
+  return createUnknown(Expr);
+}
+
+SCEVMonotonicity
+SCEVMonotonicityChecker::checkMonotonicity(const SCEV *Expr,
+                                           const Loop *OutermostLoop) {
+  assert(Expr->getType()->isIntegerTy() && "Expr must be integer type");
+  this->OutermostLoop = OutermostLoop;
+  return visit(Expr);
+}
+
+SCEVMonotonicity
+SCEVMonotonicityChecker::visitAddRecExpr(const SCEVAddRecExpr *Expr) {
+  if (!Expr->isAffine() || !Expr->hasNoSignedWrap())
+    return createUnknown(Expr);
+
+  const SCEV *Start = Expr->getStart();
+  const SCEV *Step = Expr->getStepRecurrence(*SE);
+
+  SCEVMonotonicity StartMon = visit(Start);
+  if (StartMon.isUnknown())
+    return StartMon;
+
+  if (!isLoopInvariant(Step))
+    return createUnknown(Expr);
+
+  return SCEVMonotonicity(SCEVMonotonicityType::MultiSignedMonotonic);
+}
+
 //===----------------------------------------------------------------------===//
 // DependenceInfo methods
 
@@ -3479,10 +3732,19 @@ bool DependenceInfo::tryDelinearize(Instruction *Src, Instruction *Dst,
   // resize Pair to contain as many pairs of subscripts as the delinearization
   // has found, and then initialize the pairs following the delinearization.
   Pair.resize(Size);
+  SCEVMonotonicityChecker MonChecker(SE);
+  const Loop *OutermostLoop = SrcLoop ? SrcLoop->getOutermostLoop() : nullptr;
   for (int I = 0; I < Size; ++I) {
     Pair[I].Src = SrcSubscripts[I];
     Pair[I].Dst = DstSubscripts[I];
     unifySubscriptType(&Pair[I]);
+
+    if (EnableMonotonicityCheck) {
+      if (MonChecker.checkMonotonicity(Pair[I].Src, OutermostLoop).isUnknown())
+        return false;
+      if (MonChecker.checkMonotonicity(Pair[I].Dst, OutermostLoop).isUnknown())
+        return false;
+    }
   }
 
   return true;
@@ -3815,6 +4077,14 @@ DependenceInfo::depends(Instruction *Src, Instruction *Dst,
   Pair[0].Src = SrcEv;
   Pair[0].Dst = DstEv;
 
+  SCEVMonotonicityChecker MonChecker(SE);
+  const Loop *OutermostLoop = SrcLoop ? SrcLoop->getOutermostLoop() : nullptr;
+  if (EnableMonotonicityCheck)
+    if (MonChecker.checkMonotonicity(Pair[0].Src, OutermostLoop).isUnknown() ||
+        MonChecker.checkMonotonicity(Pair[0].Dst, OutermostLoop).isUnknown())
+      return std::make_unique<Dependence>(Src, Dst,
+                                          SCEVUnionPredicate(Assume, *SE));
+
   if (Delinearize) {
     if (tryDelinearize(Src, Dst, Pair)) {
       LLVM_DEBUG(dbgs() << "    delinearized\n");
diff --git a/llvm/test/Analysis/DependenceAnalysis/monotonicity-cast.ll b/llvm/test/Analysis/DependenceAnalysis/monotonicity-cast.ll
new file mode 100644
index 0000000000000..7a72755bcaf2f
--- /dev/null
+++ b/llvm/test/Analysis/DependenceAnalysis/monotonicity-cast.ll
@@ -0,0 +1,174 @@
+; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py UTC_ARGS: --version 6
+; RUN: opt < %s -disable-output -passes="print<da>" -da-dump-monotonicity-report \
+; RUN:     -da-enable-monotonicity-check 2>&1 | FileCheck %s
+
+; int8_t offset = start;
+; for (int i = 0; i < 100; i++, offset += step)
+;   a[sext(offset)] = 0;
+;
+define void @sext_nsw(ptr %a, i8 %start, i8 %step) {
+; CHECK-LABEL: 'sext_nsw'
+; CHECK-NEXT:  Monotonicity check:
+; CHECK-NEXT:    Inst: store i8 0, ptr %idx, align 1
+; CHECK-NEXT:      Expr: {(sext i8 %start to i64),+,(sext i8 %step to i64)}<nsw><%loop>
+; CHECK-NEXT:      Monotonicity: MultiSignedMonotonic
+; CHECK-EMPTY:
+; CHECK-NEXT:  Src: store i8 0, ptr %idx, align 1 --> Dst: store i8 0, ptr %idx, align 1
+; CHECK-NEXT:    da analyze - none!
+;
+entry:
+  br label %loop
+
+loop:
+  %i = phi i64 [ 0, %entry ], [ %i.inc, %loop ]
+  %offset = phi i8 [ %start, %entry ], [ %offset.next, %loop ]
+  %offset.sext = sext i8 %offset to i64
+  %idx = getelementptr i8, ptr %a, i64 %offset.sext
+  store i8 0, ptr %idx
+  %i.inc = add nsw i64 %i, 1
+  %offset.next = add nsw i8 %offset, %step
+  %exitcond = icmp eq i64 %i.inc, 100
+  br i1 %exitcond, label %exit, label %loop
+
+exit:
+  ret void
+}
+
+; The addition for `%offset.next` can wrap, so we cannot prove monotonicity.
+;
+; int8_t offset = start;
+; for (int i = 0; i < 100; i++, offset += step)
+;   a[sext(offset)] = 0;
+;
+define void @sext_may_wrap(ptr %a, i8 %start, i8 %step) {
+; CHECK-LABEL: 'sext_may_wrap'
+; CHECK-NEXT:  Monotonicity check:
+; CHECK-NEXT:    Inst: store i8 0, ptr %idx, align 1
+; CHECK-NEXT:      Expr: (sext i8 {%start,+,%step}<%loop> to i64)
+; CHECK-NEXT:      Monotonicity: Unknown
+; CHECK-NEXT:      Reason: (sext i8 {%start,+,%step}<%loop> to i64)
+; CHECK-EMPTY:
+; CHECK-NEXT:  Src: store i8 0, ptr %idx, align 1 --> Dst: store i8 0, ptr %idx, align 1
+; CHECK-NEXT:    da analyze - confused!
+;
+entry:
+  br label %loop
+
+loop:
+  %i = phi i64 [ 0, %entry ], [ %i.inc, %loop ]
+  %offset = phi i8 [ %start, %entry ], [ %offset.next, %loop ]
+  %offset.sext = sext i8 %offset to i64
+  %idx = getelementptr i8, ptr %a, i64 %offset.sext
+  store i8 0, ptr %idx
+  %i.inc = add nsw i64 %i, 1
+  %offset.next = add i8 %offset, %step
+  %exitcond = icmp eq i64 %i.inc, 100
+  br i1 %exitcond, label %exit, label %loop
+
+exit:
+  ret void
+}
+
+; for (int8_t i = 0; i < 100; i++)
+;   a[zext(offset)] = 0;
+;
+define void @zext_pos(ptr %a) {
+; CHECK-LABEL: 'zext_pos'
+; CHECK-NEXT:  Monotonicity check:
+; CHECK-NEXT:    Inst: store i8 0, ptr %idx, align 1
+; CHECK-NEXT:      Expr: {0,+,1}<nuw><nsw><%loop>
+; CHECK-NEXT:      Monotonicity: MultiSignedMonotonic
+; CHECK-EMPTY:
+; CHECK-NEXT:  Src: store i8 0, ptr %idx, align 1 --> Dst: store i8 0, ptr %idx, align 1
+; CHECK-NEXT:    da analyze - none!
+;
+entry:
+  br label %loop
+
+loop:
+  %i = phi i8 [ 0, %entry ], [ %i.inc, %loop ]
+  %offset.zext = zext nneg i8 %i to i64
+  %idx = getelementptr i8, ptr %a, i64 %offset.zext
+  store i8 0, ptr %idx
+  %i.inc = add nsw i8 %i, 1
+  %exitcond = icmp eq i8 %i.inc, 100
+  br i1 %exitcond, label %exit, label %loop
+
+exit:
+  ret void
+}
+
+; The zero-extened value of `offset` is no longer monotonic. In fact, the
+; values of `offset` in each iteration are:
+;
+;    iteration |   0 | 1 | 2 | ...
+; -------------|-----|---|---|---------
+;       offset |  -1 | 0 | 1 | ...
+; zext(offset) | 255 | 0 | 1 | ...
+;
+;
+; for (int8_t i = -1; i < 100; i++)
+;   a[zext(offset)] = 0;
+;
+define void @zext_cross_zero(ptr %a) {
+; CHECK-LABEL: 'zext_cross_zero'
+; CHECK-NEXT:  Monotonicity check:
+; CHECK-NEXT:    Inst: store i8 0, ptr %idx, align 1
+; CHECK-NEXT:      Expr: (zext i8 {-1,+,1}<nsw><%loop> to i64)
+; CHECK-NEXT:      Monotonicity: Unknown
+; CHECK-NEXT:      Reason: (zext i8 {-1,+,1}<nsw><%loop> to i64)
+; CHECK-EMPTY:
+; CHECK-NEXT:  Src: store i8 0, ptr %idx, align 1 --> Dst: store i8 0, ptr %idx, align 1
+; CHECK-NEXT:    da analyze - confused!
+;
+entry:
+  br label %loop
+
+loop:
+  %i = phi i8 [ -1, %entry ], [ %i.inc, %loop ]
+  %offset.zext = zext nneg i8 %i to i64
+  %idx = getelementptr i8, ptr %a, i64 %offset.zext
+  store i8 0, ptr %idx
+  %i.inc = add nsw i8 %i, 1
+  %exitcond = icmp eq i8 %i.inc, 100
+  br i1 %exitcond, label %exit, label %loop
+
+exit:
+  ret void
+}
+
+; In principle, we can prove that `zext(offset)` is monotonic since we know
+; that `offset` is non-negative.
+;
+; int8_t offset = 0;
+; for (int i = 0; i < 100; i++, offset += step)
+;   a[zext(offset)] = 0;
+;
+define void @zext_nneg_nsw(ptr %a, i8 %step) {
+; CHECK-LABEL: 'zext_nneg_nsw'
+; CHECK-NEXT:  Monotonicity check:
+; CHECK-NEXT:    Inst: store i8 0, ptr %idx, align 1
+; CHECK-NEXT:      Expr: (zext i8 {0,+,%step}<nsw><%loop> to i64)
+; CHECK-NEXT:      Monotonicity: Unknown
+; CHECK-NEXT:      Reason: (zext i8 {0,+,%step}<nsw><%loop> to i64)
+; CHECK-EMPTY:
+; CHECK-NEXT:  Src: store i8 0, ptr %idx, align 1 --> Dst: store i8 0, ptr %idx, align 1
+; CHECK-NEXT:    da analyze - confused!
+;
+entry:
+  br label %loop
+
+loop:
+  %i = phi i64 [ 0, %entry ], [ %i.inc, %loop ]
+  %offset = phi i8 [ 0, %entry ], [ %offset.next, %loop ]
+  %offset.zext = zext nneg i8 %offset to i64
+  %idx = getelementptr i8, ptr %a, i64 %offset.zext
+  store i8 0, ptr %idx
+  %i.inc = add nsw i64 %i, 1
+  %offset.next = add nsw i8 %offset, %step
+  %exitcond = icmp eq i64 %i.inc, 100
+  br i1 %exitcond, label %exit, label %loop
+
+exit:
+  ret void
+}
diff --git a/llvm/test/Analysis/DependenceAnalysis/monotonicity-invariant.ll b/llvm/test/Analysis/DependenceAnalysis/monotonicity-invariant.ll
new file mode 100644
index 0000000000000..8f45dfa3af5dd
--- /dev/null
+++ b/llvm/test/Analysis/DependenceAnalysis/monotonicity-invariant.ll
@@ -0,0 +1,150 @@
+; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py UTC_ARGS: --version 6
+; RUN: opt < %s -disable-output -passes="print<da>" -da-dump-monotonicity-report \
+; RUN:     -da-enable-monotonicity-check 2>&1 | FileCheck %s
+
+; for (int i = 0; i < n; i++)
+;   a[x] = 0;
+define void @single_loop_invariant(ptr %a, i64 %x, i64 %n) {
+; CHECK-LABEL: 'single_loop_invariant'
+; CHECK-NEXT:  Monotonicity check:
+; CHECK-NEXT:    Inst: store i8 0, ptr %idx, align 1
+; CHECK-NEXT:      Expr: %x
+; CHECK-NEXT:      Monotonicity: Invariant
+; CHECK-EMPTY:
+; CHECK-NEXT:  Src: store i8 0, ptr %idx, align 1 --> Dst: store i8 0, ptr %idx, align 1
+; CHECK-NEXT:    da analyze - consistent output [S]!
+;
+entry:
+  %guard = icmp sgt i64 %n, 0
+  br i1 %guard, label %loop, label %exit
+
+loop:
+  %i = phi i64 [ 0, %entry ], [ %i.inc, %loop ]
+  %idx = getelementptr inbounds i8, ptr %a, i64 %x
+  store i8 0, ptr %idx
+  %i.inc = add nsw i64 %i, 1
+  %exitcond = icmp eq i64 %i.inc, %n
+  br i1 %exitcond, label %exit, label %loop
+
+exit:
+  ret void
+}
+
+; for (int i = 0; i < n; i++)
+;   a[(i % 2 == 0 ? x : y)] = 0;
+define void @single_loop_variant(ptr %a, i64 %x, i64 %y, i64 %n) {
+; CHECK-LABEL: 'single_loop_variant'
+; CHECK-...
[truncated]

Copy link
Contributor

@amehsan amehsan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will start looking into this tomorrow, and would like to do a proper review of the main algorithm. Please bear with me as I may be interrupted by other stuff. Thanks a lot.

Comment on lines +4082 to +4084
if (EnableMonotonicityCheck)
if (MonChecker.checkMonotonicity(Pair[0].Src, OutermostLoop).isUnknown() ||
MonChecker.checkMonotonicity(Pair[0].Dst, OutermostLoop).isUnknown())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a basic question about these two tests here: If we have an AddRec with a nsw flag, that means this AddRec doesn't wrap. Why that is not enough and we need to recursively check each component of AddRec?

I guess the flags from SCEV assume all the internal components are fixed and only the top level calculation doesn't overflow? Is that correct?

In that case you may want to have a testcase where the top level AddRec has nsw, but monotonicity fails. I didn't see that in your test, but in other test files we have examples of that. It is helpful to add that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However the example that I see is this loop (the first test in SameSDLoops.ll)

;;  for (long int i = 0; i < 10; i++) {
;;    for (long int j = 0; j < 10; j++) {
;;      for (long int k = 0; k < 10; k++) {
;;        for (long int l = 0; l < 10; l++)
;;          A[i][j][k][l] = i;
;;      }
;;      for (long int k = 1; k < 11; k++) {
;;        for (long int l = 0; l < 10; l++)
;;          A[i + 4][j + 3][k + 2][l + 1] = l;

It is strange that we cannot prove monotonicity here:

Printing analysis 'Dependence Analysis' for function 'samebd0':
Monotonicity check:
  Inst:   store i64 %i.013, ptr %arrayidx12, align 8
    Expr: {{{{0,+,8000000}<nuw><nsw><%for.cond1.preheader>,+,80000}<nuw><nsw><%for.cond4.preheader>,+,800}<nuw><nsw><%for.cond7.preheader>,+,8}<nuw><nsw><%for.body9>
    Monotonicity: MultiSignedMonotonic
  Inst:   store i64 %l17.04, ptr %arrayidx24, align 8
    Expr: {{{{32242408,+,8000000}<nuw><nsw><%for.cond1.preheader>,+,80000}<nw><%for.cond4.preheader>,+,800}<nuw><nsw><%for.cond18.preheader>,+,8}<nuw><nsw><%for.body20>
    Monotonicity: Unknown
    Reason: {{32242408,+,8000000}<nuw><nsw><%for.cond1.preheader>,+,80000}<nw><%for.cond4.preheader>

Copy link
Contributor Author

@kasuga-fj kasuga-fj Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a basic question about these two tests here: If we have an AddRec with a nsw flag, that means this AddRec doesn't wrap. Why that is not enough and we need to recursively check each component of AddRec?

I guess the flags from SCEV assume all the internal components are fixed and only the top level calculation doesn't overflow? Is that correct?

In my understanding, your guess is correct. I added a test case @outer_loop_may_wrap, which I believe demonstrates the scenario where only the outer addrec is guaranteed not to wrap.

However the example that I see is this loop (the first test in SameSDLoops.ll)

;;  for (long int i = 0; i < 10; i++) {
;;    for (long int j = 0; j < 10; j++) {
;;      for (long int k = 0; k < 10; k++) {
;;        for (long int l = 0; l < 10; l++)
;;          A[i][j][k][l] = i;
;;      }
;;      for (long int k = 1; k < 11; k++) {
;;        for (long int l = 0; l < 10; l++)
;;          A[i + 4][j + 3][k + 2][l + 1] = l;

It is strange that we cannot prove monotonicity here:

Printing analysis 'Dependence Analysis' for function 'samebd0':
Monotonicity check:
  Inst:   store i64 %i.013, ptr %arrayidx12, align 8
    Expr: {{{{0,+,8000000}<nuw><nsw><%for.cond1.preheader>,+,80000}<nuw><nsw><%for.cond4.preheader>,+,800}<nuw><nsw><%for.cond7.preheader>,+,8}<nuw><nsw><%for.body9>
    Monotonicity: MultiSignedMonotonic
  Inst:   store i64 %l17.04, ptr %arrayidx24, align 8
    Expr: {{{{32242408,+,8000000}<nuw><nsw><%for.cond1.preheader>,+,80000}<nw><%for.cond4.preheader>,+,800}<nuw><nsw><%for.cond18.preheader>,+,8}<nuw><nsw><%for.body20>
    Monotonicity: Unknown
    Reason: {{32242408,+,8000000}<nuw><nsw><%for.cond1.preheader>,+,80000}<nw><%for.cond4.preheader>

I don't know much about how nowrap flags are transferred from IR to SCEV, but this appears to be a limitation of SCEV. At a glance, it’s not obvious that the second store A[i + 4][j + 3][k + 2][l + 1] = l is always executed when entering the j-loop. This may be the reason why the nowrap flags for %for.cond4.preheader are not preserved in SCEV.

Anyway, for this specific case, I think we could perform additional cheap analysis similar to range analysis in SCEV, since all values except the induction variables are constants. That said, I'm not planning to include such a feature in this PR.

@sjoerdmeijer
Copy link
Collaborator

just an fyi: I also started looking into this, but need a bit of time to get up to speed with this.

Comment on lines +3742 to +3747
if (EnableMonotonicityCheck) {
if (MonChecker.checkMonotonicity(Pair[I].Src, OutermostLoop).isUnknown())
return false;
if (MonChecker.checkMonotonicity(Pair[I].Dst, OutermostLoop).isUnknown())
return false;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another question here, otherwise LGTM.

If we have mutliple subscripts and all of them are monotonic, how could the other monotonicity check (line 4083-4) fail? We need to answer this to make sure we are not running the test redundantly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider the following case (godbolt):

; char A[][32];
; for (i = 0; i < 1ll << 62; i++)
;   for (j = 0; j < 32; j++)
;     if (i < (1ll << 57))
;       A[i][j] = 0;
define void @outer_loop_may_wrap(ptr %a) {
entry:
  br label %loop.i.header

loop.i.header:
  %i = phi i64 [ 0, %entry ], [ %i.inc, %loop.i.latch ]
  br label %loop.j.header

loop.j.header:
  %j = phi i64 [ 0, %loop.i.header ], [ %j.inc, %loop.j.latch ]
  %cond = icmp slt i64 %i, 144115188075855872  ; 2^57
  br i1 %cond, label %if.then, label %loop.j.latch

if.then:
  %gep = getelementptr inbounds [32 x i8], ptr %a, i64 %i, i64 %j
  store i8 0, ptr %gep
  br label %loop.j.latch

loop.j.latch:
  %j.inc = add nuw nsw i64 %j, 1
  %ec.j = icmp eq i64 %j.inc, 32
  br i1 %ec.j, label %loop.i.latch, label %loop.j.header

loop.i.latch:
  %i.inc = add nuw nsw i64 %i, 1
  %ec.i = icmp eq i64 %i.inc, 4611686018427387904  ; 2^62
  br i1 %ec.i, label %exit, label %loop.i.header


exit:
  ret void
}

The subscripts {0,+,1}<nuw><nsw><%loop.i.header> and {0,+,1}<nuw><nsw><%loop.j.header> are monotonic, but the original offset {{0,+,32}<%loop.i.header>,+,1}<nw><%loop.j.header> is not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the above test case.

@sjoerdmeijer
Copy link
Collaborator

sjoerdmeijer commented Oct 15, 2025

I went through other and older merge request and issues to see how we got here, and I think I am now mostly up to speed. Just wanted to share some high level observations first before I look more into the details.

I think it all started with #148435 and the observation that and an access with i%2 results in SCEV i1 {false,+,true}<%loop> that toggles between values 0 and 1. I.e, it iterates through all points in its space, then wraps around. So, just for clarity and completeness, we are thus not necessarily talking about signed/unsigned wrapping behaviour from say the C/C++ language point of view, but just the SCEV wrapping behaviour to capture an access pattern like toggling between values 0 and 1 (and of course other similar patterns). I believe the sign extension of this i1 SCEV to i64 is problematic too, but maybe that is separate?

The idea of monoticity to capture the behaviour that we are not iterating through the whole iteration space again and again makes perfect sense. At this point, the following is also unclear to me:

Notably, it is still unclear whether we should also have a category for unsigned monotonicity.

I.e., I do not see how unsigned monotonicity is going to be different, but I need to think a bit more about this. Related to this, I am also not in love with the name MultiSignedMonotonic, but I see how "Signed" was suggested in the earlier review, and I see how the different components capture the different aspects here. If we don't need to distinguish between signed/unsigned we could drop Signed but guess this is to be determined. At a minimum, at this point, I would like to see a more crisp definition of the 3 components Multi, Signed, and Monotonicity. It's kind of there, but spread out, and I would like to see it all captured just before the point where MultiSignedMonotonic is defined. Maybe we can just drop Multi if we define this property to hold for all SCEVs? I know this is a little bit bikeshedding, but if these terms are going to stick, it is worth discussing a bit. :-)

Having looked into this better now, I actually do agree that it is a bit questionable whether this belongs in SCEV or here in DA. I think we are describing properties of a SCEV, so SCEV is the first thing that would come to my mind, but I am also okay with continuing here and to develop this in-tree while it is off by default. It looks all self-contained and easy to move if we wanted to do this later.

Going to look a bit deeper into this now, but wanted to leave this nit while I am doing that: I liked your example in #148435 (comment) to understand this. If I am not mistaken, you didn't include it here, but may be interesting to add to your tests.

@kasuga-fj
Copy link
Contributor Author

Going to look a bit deeper into this now, but wanted to leave this nit while I am doing that: I liked your example in #148435 (comment) to understand this. If I am not mistaken, you didn't include it here, but may be interesting to add to your tests.

I just added that case. I'll follow up with more detailed comments shortly.

/// The expression is a (nested) affine AddRec and is monotonically increasing
/// or decreasing in a signed sense with respect to each loop. Monotonicity is
/// checked independently for each loop, and the expression is classified as
/// MultiSignedMonotonic if all AddRecs are nsw. For example, in the following
Copy link
Member

@Meinersbur Meinersbur Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I brought up monotonicity, I did not mean to apply it to only AddRec expression (which already has a FlagNW property), but to SCEVs in general. An expression could overflow, while none of its operands does. For instance

for (int i = 0; i < INT_MAX; ++i)
  A[i + 2];

Additions usually get folded into the SCEVAddRecExpr, but others are not, such as UDiv, URem, UMax, UMin, ... . That is

for (int i = 0; i < INT_MAX; ++i)
  A[i/2];

is monotonic

It is fine if we want to only handle outermost-AddRecs at first, but the comments implies this is about AddRec expressions only. The goal was to ensure that the closed range [Expr(first-iteration), Expr(last-iteration)] (or [Expr(last-iteration), Expr(first-iteration)]) describes the entire range of the expression, i.e. there is no i for which Expr(i) that falls outside that range. The name "monotonic" came from because last-iteration could be any iteration (SCEV does not know when the loop will actually terminate), and the range therefore be the range of values Expr could evaluate so far, leading to a (not necessarily strictly) (increasing or decreasing) monotonic function.

For multiple iteration variables, the range could be [Expr(i_first, j_first) .. Expr(i_last, j_last)] (or [Expr(i_last, j_last) .. Expr(i_first, j_first)]), or the combinatorial

ExtremeValueList = {Expr(i_first, j_first), Expr(i_first, j_last), Expr(i_last, j_first), Expr(i_last, j_last)};
Range := [min(ExtremeValueList), max(ExtremeValueList)]`. 

According to the definition of MultiSignedMonotonic, it would be the latter. In my reading of https://en.wikipedia.org/wiki/Monotonic_function, functional analysis would use the first, topology the lastter definition.

If on the other side you understand SCEVMonotonicityType as the same as FlagNSW, but taking for taking all loops of the nest into account, not just the one store in the SCEVAddRecExpr), I would suggest to not call that property "monotonicity".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first, I was thinking of handling various cases like as you mentioned, but now I think it's fine to focus on AddRecs for the time being. I don't have a strong preference, so changing the name seems reasonable to me (it's also a bit questionable that Invariant is included in the SCEVMonotonicityType in the first place, as you said in #162280 (comment)).

I'm thinking of renaming it something like MultivariateWrapType. Since I'm not good at naming, I'm happy if you have a better idea.

As for the definition, the latter one seems to match my intention.

ExtremeValueList := {Expr(i_first, j_first), Expr(i_first, j_last), Expr(i_last, j_first), Expr(i_last, j_last)};
Range := [min(ExtremeValueList), max(ExtremeValueList)]
IsMonotonic(Expr) := Expr(i, j) is in Range for all i in [i_first, i_last] and j in [j_first, j_last]

(since I slacked off on studying, I don't really understand about topology...)

/// Note that we don't check if the step recurrence can be zero. For
/// example,an AddRec `{0,+,%a}<nsw> is classifed as Monotonic if `%a` can be
/// zero. That is, the expression can be Invariant.
MultiSignedMonotonic,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MultivariateMonotonic

For a mathematical (monotonic) function, invariants are just constants that do not appear in the function's domain.

@amehsan
Copy link
Contributor

amehsan commented Oct 15, 2025

Notably, it is still unclear whether we should also have a category for unsigned monotonicity

In the loop below

 for (unsigned long long i = (1ULL << 62)+ ((1ULL << 62) - 1) ; i < (1ULL <<63) + 3000; i++) {
    A[i] = 7;
}

; Function Attrs: nofree norecurse nosync nounwind memory(argmem: write) uwtable
define dso_local void @foo(ptr noundef writeonly captures(none) %A, ptr noundef readnone captures(none) %B, i64 noundef %n) local_unnamed_addr #0 {
entry:
  br label %for.body

for.cond.cleanup:                                 ; preds = %for.body
  ret void

for.body:                                         ; preds = %entry, %for.body
  %i.03 = phi i64 [ 9223372036854775807, %entry ], [ %inc, %for.body ]
  %arrayidx = getelementptr inbounds nuw i8, ptr %A, i64 %i.03
  store i8 7, ptr %arrayidx, align 1, !tbaa !6
  %inc = add nuw i64 %i.03, 1
  %exitcond.not = icmp eq i64 %inc, -9223372036854772808
  br i1 %exitcond.not, label %for.cond.cleanup, label %for.body, !llvm.loop !9
}

The monotonocity check fails

Monotonicity check:
  Inst:   store i8 7, ptr %arrayidx, align 1, !tbaa !6
    Expr: {9223372036854775807,+,1}<nuw><%for.body>
    Monotonicity: Unknown
    Reason: {9223372036854775807,+,1}<nuw><%for.body>

Src:  store i8 7, ptr %arrayidx, align 1, !tbaa !6 --> Dst:  store i8 7, ptr %arrayidx, align 1, !tbaa !6
  da analyze - confused!

This is not a correctness issue, and I am not sure if there will be any realistic performance issue caused by this or not. But I thought to bring it up.

Talking about performance impact of monotonicity check, the example that I gave before is more concerning:

  for (long int i = 0; i < 10; i++) {
    for (long int j = 0; j < 10; j++) {
      for (long int k = 0; k < 10; k++) {
        for (long int l = 0; l < 10; l++)
          A[i][j][k][l] = i;
      }
      for (long int k = 1; k < 11; k++) {
        for (long int l = 0; l < 10; l++)
          A[i + 4][j + 3][k + 2][l + 1] = l;

Printing analysis 'Dependence Analysis' for function 'samebd0':
Monotonicity check:
  Inst:   store i64 %i.013, ptr %arrayidx12, align 8
    Expr: {{{{0,+,8000000}<nuw><nsw><%for.cond1.preheader>,+,80000}<nuw><nsw><%for.cond4.preheader>,+,800}<nuw><nsw><%for.cond7.preheader>,+,8}<nuw><nsw><%for.body9>
    Monotonicity: MultiSignedMonotonic
  Inst:   store i64 %l17.04, ptr %arrayidx24, align 8
    Expr: {{{{32242408,+,8000000}<nuw><nsw><%for.cond1.preheader>,+,80000}<nw><%for.cond4.preheader>,+,800}<nuw><nsw><%for.cond18.preheader>,+,8}<nuw><nsw><%for.body20>
    Monotonicity: Unknown
    Reason: {{32242408,+,8000000}<nuw><nsw><%for.cond1.preheader>,+,80000}<nw><%for.cond4.preheader>

loops like this are common in fortran programs. We probably need to look into this and if the missing flag is not a bug then the question is how we can safely check dependency for this?

I don't think this issue prevents us from merging this patch, since this is currently the best solution that we know to address some correctness issues. But anyways, this should be investigated. I think we can do at least an initial investigation on our end.

@kasuga-fj
Copy link
Contributor Author

I think it all started with #148435 and the observation that and an access with i%2 results in SCEV i1 {false,+,true}<%loop> that toggles between values 0 and 1. I.e, it iterates through all points in its space, then wraps around. So, just for clarity and completeness, we are thus not necessarily talking about signed/unsigned wrapping behaviour from say the C/C++ language point of view, but just the SCEV wrapping behaviour to capture an access pattern like toggling between values 0 and 1 (and of course other similar patterns). I believe the sign extension of this i1 SCEV to i64 is problematic too, but maybe that is separate?

Yeah, we are talking about the SCEV wrapping behavior here, not C/C++ (or other languages) specific behavior. As for the sign extension, I'm not entirely sure if it's problematic, but it's certain that there are some issues in DA with handling of sign/zero extensions. I think at least DependenceInfo::removeMatchingExtensions needs to be fixed.

The idea of monoticity to capture the behaviour that we are not iterating through the whole iteration space again and again makes perfect sense. At this point, the following is also unclear to me:

Notably, it is still unclear whether we should also have a category for unsigned monotonicity.

I.e., I do not see how unsigned monotonicity is going to be different, but I need to think a bit more about this. Related to this, I am also not in love with the name MultiSignedMonotonic, but I see how "Signed" was suggested in the earlier review, and I see how the different components capture the different aspects here. If we don't need to distinguish between signed/unsigned we could drop Signed but guess this is to be determined. At a minimum, at this point, I would like to see a more crisp definition of the 3 components Multi, Signed, and Monotonicity. It's kind of there, but spread out, and I would like to see it all captured just before the point where MultiSignedMonotonic is defined. Maybe we can just drop Multi if we define this property to hold for all SCEVs? I know this is a little bit bikeshedding, but if these terms are going to stick, it is worth discussing a bit. :-)

The fundamental problem is a bit more serious. We need to pay attention to wrappings, even if a value doesn't iterate through the entire space. For instance, symbolicRDIVtest assumes that a*i + c takes its minimum value when i = 0 and its maximum value when i = N-1 (N-1 is the BTC) when a > 0. This assumption is valid only when a*i + c doesn't wrap. For example, consider the following conditions:

  • a = 2^61
  • c = 0
  • N = 6
  • The bitwidth of the integer type is 64
  • All integers are interpreted as signed

Then the minimum value of a*i + c is -2^63 (when i = 4) and the maximum value is 2^62 + 2^61 (when i = 3) so that symbolicRDIVtest can produce an incorrect result. I think this also answers your question about the necessity of distinguishing between signed and unsigned. If the integers are interpreted as unsigned, the minimum value is 0 (when i = 0) and the maximum value is 2^63 + 2^61 (when i = 5). So in this case using unsigned interpretation might be better, but it would make difficult to handle negative values.

As for the naming, I believe I wrote the intention of the name MultiSignedMonotonic in the comment of the code. I'm not confident it’s written clearly, so I’d appreciate any feedback.

@kasuga-fj
Copy link
Contributor Author

kasuga-fj commented Oct 15, 2025

Notably, it is still unclear whether we should also have a category for unsigned monotonicity

In the loop below [...] This is not a correctness issue, and I am not sure if there will be any realistic performance issue caused by this or not. But I thought to bring it up.

It is one of the largest issue in the current implementation. A simpler case like below cannot be handled well, since N can be larger than INT64_MAX.

for (size_t i = 0; i < N; i++)
  A[i] = 0;

Talking about performance impact of monotonicity check, the example that I gave before is more concerning:

  for (long int i = 0; i < 10; i++) {
    for (long int j = 0; j < 10; j++) {
      for (long int k = 0; k < 10; k++) {
        for (long int l = 0; l < 10; l++)
          A[i][j][k][l] = i;
      }
      for (long int k = 1; k < 11; k++) {
        for (long int l = 0; l < 10; l++)
          A[i + 4][j + 3][k + 2][l + 1] = l;

Printing analysis 'Dependence Analysis' for function 'samebd0':
Monotonicity check:
  Inst:   store i64 %i.013, ptr %arrayidx12, align 8
    Expr: {{{{0,+,8000000}<nuw><nsw><%for.cond1.preheader>,+,80000}<nuw><nsw><%for.cond4.preheader>,+,800}<nuw><nsw><%for.cond7.preheader>,+,8}<nuw><nsw><%for.body9>
    Monotonicity: MultiSignedMonotonic
  Inst:   store i64 %l17.04, ptr %arrayidx24, align 8
    Expr: {{{{32242408,+,8000000}<nuw><nsw><%for.cond1.preheader>,+,80000}<nw><%for.cond4.preheader>,+,800}<nuw><nsw><%for.cond18.preheader>,+,8}<nuw><nsw><%for.body20>
    Monotonicity: Unknown
    Reason: {{32242408,+,8000000}<nuw><nsw><%for.cond1.preheader>,+,80000}<nw><%for.cond4.preheader>

loops like this are common in fortran programs. We probably need to look into this and if the missing flag is not a bug then the question is how we can safely check dependency for this?

I don't think this issue prevents us from merging this patch, since this is currently the best solution that we know to address some correctness issues. But anyways, this should be investigated. I think we can do at least an initial investigation on our end.

I haven’t looked into the details of how nowrap flags are inferred in SCEV, but I guess SCEV fails to prove that the second store is executed unconditionally1. In fact, inserting a conditional branch before the first store (like this) also causes the nowrap flag to be dropped from the SCEV for the first store and the monotonicity check fails.

Regardless of the cause, I think it should be handled on the DA side. In this specific case, all the variables involved, except for the induction variables, are constants, so I believe it shouldn’t be too difficult to prove that they don’t wrap. Just holding a possible minimum/maximum value while checking nsw flag recursively should be enough.

Footnotes

  1. Maybe SCEV intentionally didn’t implement it because of compile-time impact.

@amehsan
Copy link
Contributor

amehsan commented Oct 15, 2025

I guess SCEV fails to prove that the second store is executed unconditionally

Even for a much simpler IR we have the same problem:

; Function Attrs: nofree norecurse nosync nounwind memory(argmem: write) uwtable
define dso_local void @foo(ptr noundef writeonly captures(none) %A) local_unnamed_addr #0 {
entry:
  br label %for.body

for.cond.cleanup:                                 ; preds = %for.body
  ret void

for.body:                                         ; preds = %entry, %for.body
  %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
  %conv = trunc i64 %indvars.iv to i8
  %arrayidx = getelementptr inbounds nuw [20 x i8], ptr %A, i64 %indvars.iv
  %arrayidx1 = getelementptr inbounds nuw i8, ptr %arrayidx, i64 15
  store i8 %conv, ptr %arrayidx1, align 1, !tbaa !6
  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
  %exitcond.not = icmp eq i64 %indvars.iv.next, 10
  br i1 %exitcond.not, label %for.cond.cleanup, label %for.body, !llvm.loop !9
}

Source code

void foo (char A[][20]) {
  for (int i = 0; i < 10; i++) {
          A[i][15] = i;
  }
}

Report:

Printing analysis 'Dependence Analysis' for function 'foo':
Monotonicity check:
  Inst:   store i8 %conv, ptr %arrayidx1, align 1, !tbaa !6
    Expr: {15,+,20}<%for.body>
    Monotonicity: Unknown
    Reason: {15,+,20}<%for.body>

Src:  store i8 %conv, ptr %arrayidx1, align 1, !tbaa !6 --> Dst:  store i8 %conv, ptr %arrayidx1, align 1, !tbaa !6
  da analyze - confused!

Command line:

opt  -disable-output "-passes=print<da>" -aa-pipeline=basic-aa   -da-enable-monotonicity-check -da-dump-monotonicity-report

I don't know how the IR of the testcase in SameSDloop.ll is generated and why the first loop there is OK.

I suspect if we start solving this kind of problem in DA, we will eventually reinvent many SCEV wheels in DA.

@kasuga-fj
Copy link
Contributor Author

It's strange to me that getelementptr appears twice. I tried replacing it with a single getelementptr instruction like getelementptr inbounds nuw [20 x i8], ptr %A, i64 %indvars.iv, i64 15 and then nuw/nsw flags were attached as expected.

I suspect if we start solving this kind of problem in DA, we will eventually reinvent many SCEV wheels in DA.

By limiting the scope to cases where the BTC and the operands of the addrecs are constants, I believe it shouldn’t be very complex. If we want to make (potentially significant) changes to SCEV for the sake of DA to support more complex cases, then I think we should first get DA into the default pipeline and demonstrate its usefulness...

Copy link
Contributor

@amehsan amehsan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On some 64-bit platforms, the higher bits of address are ignored. Sometimes it is even possible to use that to store additional data (see TBI feature in AArch64). I believe in x86 we have a similar situation but the top bits need to follow a canonical form. Doesn't this allow us to assume a correct program won't have overflow in address calculation? (EDIT: Same thing for subscript. They cannot be arbitrary 64 bit values. Effective bit-width should be smaller, and I suspect that may allow us to ignore any concern about AddRec value wrapping).

Comment on lines 194 to 199
/// The type of monotonicity of a SCEV. This property is defined with respect to
/// the outermost loop that DA is analyzing.
///
/// This is designed to classify the behavior of AddRec expressions, and does
/// not care about other SCEVs. For example, given the two loop-invariant values
/// `A` and `B`, `A + B` is treated as Invariant even if the addition wraps.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using Monotonic is fine if the documentation/definition reflects what it is meant to be, even if it only implements it for AddRec atm. That would give a clear path how it cold to be extended.

This property is defined with respect to the outermost loop that DA is analyzing.

Could be understood as FlagNSW of the outermost loop only, but I think you mean wrapping of any nested loop.

This is designed to classify the behavior of AddRec expressions, and does not care about other SCEVs.

The current doxygen for SCEVMonotonicityType says it it only for AddRecs, and mixes monotonicity and wrapping (I think we consider monotonicity to imply no-wrapping, so if a wrapping AddRec is found it cannot be monotonic, but the other way around may not be true, e.g. with a non-affine SCEVAddRecExpr). Only caring about AddRecs seems arbitrary. Why? What is the property we want to ensure? Could you use a clearer definition?

For example, given the two loop-invariant values A and B, A + B is treated as Invariant even if the addition wraps.

I think this is easier to explain with monotonicity, which is always relative to a variates, the loop induction variables in this case. A and B are just constants (so A + B also evaluates to just a constant, even if it is the result of an overflow, and could have been hoisted out of the loop), i.e. the function over which we defined monotonicity is $f_{A,B}(i)$, where $i$ is the variate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's clearly mixing up the definition and the implementation. The definition itself doesn't need to be limited to AddRecs. I'll rewrite the definition.

I think we consider monotonicity to imply no-wrapping, so if a wrapping AddRec is found it cannot be monotonic

Correct, that's what I tried to describe.

For example, given the two loop-invariant values A and B, A + B is treated as Invariant even if the addition wraps.

I think this is easier to explain with monotonicity, which is always relative to a variates, the loop induction variables in this case. A and B are just constants (so A + B also evaluates to just a constant, even if it is the result of an overflow, and could have been hoisted out of the loop), i.e. the function over which we defined monotonicity is f A , B ( i ) , where i is the variate.

Do you mean that monotonicity is not defined for constants in the first place?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revised the definition of monotonicity. I believe it's better than before...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean that monotonicity is not defined for constants in the first place?

Constants, in the sense of a parametric function, are not the subject of the monotonicity property, only the function argument/variate is. It is the question "when modifying x, how does $f_c(x)$ behave". The answer may depend on $c$, such as $f_c(x)=c*x$ (strictly monotonically decreasing for $c &lt; 0$, monotonic for $c &gt; 0$, strictly monotonically increasing for $c &gt; 0$). Like in case of a derivative, we ask for the slope of $f_c(x)$ at some position $x$. The answer may depend on c ($f'_c(x)=c$), but we are not asking for the slope of infinitismally close $c$ and $c'$.

For loop nests, $c$ is typically the increament of a loop/stride of an array subscript and therefore usually a literal constant, i.e. known $c$. If $c$ is not known at compile-time, the query of monotonicity must either include "for which value of c?", or answer with the most pessimistic result for any $c$, possibly Unkown, i.e. using a forall quantifier. For the $f_c(x)=c*x$ example above, without any wrapping behavior, the answer can be monotonic, as long as we do not care whether it is strict/increasing/decreasing.

For DA, it just means within the scope of the anlysis, $c$ will always have the same value, the behavior of the same LLVM function called with a different value of $c'$ when is irrelvant. DA does not give you dependency information accross function calls (respectively: different values of an invariant variable), only between loop iterations within the same function call. If a and b are invariant/constant, a + b is just some value, no matter how it was computed. It could also be an unknown function g(a,b), as long as we know that the result of g only depends on its arguments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I think I got it. It seems that I was mixed parameters and arguments for a parametric function.

@kasuga-fj kasuga-fj force-pushed the users/kasuga-fj/da-monotonic-check-0 branch from aceb708 to 83ab4c6 Compare October 16, 2025 16:39
Copy link
Member

@Meinersbur Meinersbur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, please still consider other's remarks

/// sense. Note that the multimonotonic function may also be a constant
/// function. The order employed in the definition of monotonicity is not
/// strict order.
MultivariateSignedMonotonic,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used "multimonotonic" in the description, but "Multivariate monotonic" here. Consistency?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unified to "multivariate monotonic".

Comment on lines 194 to 199
/// The type of monotonicity of a SCEV. This property is defined with respect to
/// the outermost loop that DA is analyzing.
///
/// This is designed to classify the behavior of AddRec expressions, and does
/// not care about other SCEVs. For example, given the two loop-invariant values
/// `A` and `B`, `A + B` is treated as Invariant even if the addition wraps.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean that monotonicity is not defined for constants in the first place?

Constants, in the sense of a parametric function, are not the subject of the monotonicity property, only the function argument/variate is. It is the question "when modifying x, how does $f_c(x)$ behave". The answer may depend on $c$, such as $f_c(x)=c*x$ (strictly monotonically decreasing for $c &lt; 0$, monotonic for $c &gt; 0$, strictly monotonically increasing for $c &gt; 0$). Like in case of a derivative, we ask for the slope of $f_c(x)$ at some position $x$. The answer may depend on c ($f'_c(x)=c$), but we are not asking for the slope of infinitismally close $c$ and $c'$.

For loop nests, $c$ is typically the increament of a loop/stride of an array subscript and therefore usually a literal constant, i.e. known $c$. If $c$ is not known at compile-time, the query of monotonicity must either include "for which value of c?", or answer with the most pessimistic result for any $c$, possibly Unkown, i.e. using a forall quantifier. For the $f_c(x)=c*x$ example above, without any wrapping behavior, the answer can be monotonic, as long as we do not care whether it is strict/increasing/decreasing.

For DA, it just means within the scope of the anlysis, $c$ will always have the same value, the behavior of the same LLVM function called with a different value of $c'$ when is irrelvant. DA does not give you dependency information accross function calls (respectively: different values of an invariant variable), only between loop iterations within the same function call. If a and b are invariant/constant, a + b is just some value, no matter how it was computed. It could also be an unknown function g(a,b), as long as we know that the result of g only depends on its arguments.

@amehsan
Copy link
Contributor

amehsan commented Oct 17, 2025

However, it's incorrect to assume that the subscript 2 * i never overflows across the entire iteration space.

My mistake. I think this makes the situation worse, not better. For a platform that ignores upper 8 bits of a 64-bit pointer, even in a simple loop like this

 for (long long i = 0 ; i < n; i++)
    A[i] = 0;

Every time the lower 56 bits of i wrap (and for a 64 bit value a 56 bit signed wrap happens 2^9 times), the subscripts wraps. This patch can detect 64 bit wraps, but that is a small fraction of all possible 56-bit wraps. for example see the following code:

    size_t size =2000;
    unsigned long long T = (1ULL << 58) - 1ULL ;
    char *p = malloc(size);
    if (p) {
        p[1] = 98;
        printf("ptr: T = %llx\n", T);
        printf("ptr: %p\n", &p[1]);
        printf("ptr: %p\n", &p[T+2]);
        printf("val: %d\n", p[1]);
        printf("val: %d\n", p[T+2]);
        free(p);
    }

output:

T = 3ffffffffffffff
ptr: 0xd1892a1
ptr: 0x40000000d1892a1
val: 98
val: 98

I don't have objections to merging this patch. This works well if effective pointer size is 32 or 64. But I think this issue needs to be discussed further, independent of this patch (for example, I suspect we can write test cases that are vectorized but they are not legal, and it is question for me how important is to fix all these unlikely corner cases -- EDIT: in particular could this kind of bug be a security issue?).

/// F(i_1, ..., i_{k-1}, x, i_{k+1}, ..., i_N) <=
/// F(i_1, ..., i_{k-1}, y, i_{k+1}, ..., i_N)
///
/// where i_1, ..., i_{k-1}, i_{k+1}, ..., i_N, x, y in their domains.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels like a verb is missing in this sentence, maybe something like:

Suggested change
/// where i_1, ..., i_{k-1}, i_{k+1}, ..., i_N, x, y in their domains.
/// where i_1, ..., i_{k-1}, i_{k+1}, ..., i_N, x, and y are elements of their
/// respective domains.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed.

/// F(i_1, ..., i_{k-1}, x, i_{k+1}, ..., i_N) >=
/// F(i_1, ..., i_{k-1}, y, i_{k+1}, ..., i_N)
///
/// A function F with either monotonically increasing or decreasing with
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// A function F with either monotonically increasing or decreasing with
/// A function F that is monotonically increasing or decreasing with

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Excuse: I mimicked this phrase from wikipedia.

A function with either property is called strictly monotone.

Probably I need to learn English harder...

/// "monotonic with respect to k-th loop".
///
/// A function F is said to be "multimonotonic" when it is monotonic with
/// respect to all of the N loops.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for my own understanding, but guess it could be generally beneficial, can you elaborate here why multimonotonic is important? E.g. why is is this required and for what?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is just a definition, I'd prefer to avoid adding such context here. Instead added brief description to SCEVMonotonicityChecker.

/// sense. Note that the inequality "x <= y" merely indicates loop progression
/// and is not affected by the difference between signed and unsigned order.
///
/// Currently we only consider monotonicity in a signed sense.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of things have been discussed in the comments. I need to catch up on some, but that is my problem. What I was going to ask is this. I really like the clear descriptions so far, but can we add, or is it worth explaining a little bit more, the algorithm that determines monotonicity? That will be a high level description of course, but I feel explaining the concepts of looking at AddRec and the nowrap flags etc. is missing a little bit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some more comments.

@sjoerdmeijer
Copy link
Collaborator

(for example, I suspect we can write test cases that are vectorized but they are not legal,

I think you should definitely do that, and raise issues for them. That's is the first step to discussing this further, I think, if we can find such examples.

and it is question for me how important is to fix all these unlikely corner cases -- EDIT: in particular could this kind of bug be a security issue?).

But this code is not portable, it is not language conforming? Thus, this example and its behaviour is tied to this particular platform, but that the behaviour isn't specified and could be anything. Reading passed the array is definitely a security problem, but I am not yet connecting the dots here with the SCEV behaviour.

@kasuga-fj
Copy link
Contributor Author

kasuga-fj commented Oct 17, 2025

I don't have objections to merging this patch. This works well if effective pointer size is 32 or 64. But I think this issue needs to be discussed further, independent of this patch (for example, I suspect we can write test cases that are vectorized but they are not legal, and it is question for me how important is to fix all these unlikely corner cases -- EDIT: in particular could this kind of bug be a security issue?).

I'm not sure if this answers your question, but you might want to take a look at LangRef for details on how address calculations are interpreted in LLVM. I'm also not entirely sure, but it might be true that we should pay a bit more attention to the pointer index type, which should be able to get from functions like DataLayout::getIndexType.
However, IIUC, since we're obtaining the SCEV from an actual pointer value that exists in the IR and using exact BTCs, I don't think it's much of a problem.

@amehsan
Copy link
Contributor

amehsan commented Oct 17, 2025

reading passed the array is definitely a security problem

We are not accessing any out of bound memory in the above example. I am not sure whether this violates any language standard or not. I will look into it.

The fundamental issue is that two different 64 bit addresses may point to the same memory location at least on AArch64. DataLayout::getIndexSize() and DataLayout::getPointerSize() both return 8. I will check the langref further to see if there is anything I miss.

On x86 the situation maybe different. Since the upper bits of address have to be in a canonical form. I will do some more investigation on x86 as well, and then open an issue to discuss the implications.

Regarding the bug, haven't checked vectorization yet, but I have another bug. Basically alias analysis thinks p and p + 1ULL << 58 point to two different memory location but that is not correct. (I have tried this on two different AArch64 chips from two different vendors)

Compile with O0 and O3. The results are different.

#include <stdio.h>

__attribute__((noinline))
char foo (char *a) {

  char *b = a+ (1ULL << 58) ;
  *b = *a + 2;
  return *a;

}

int main () {

  char a[4];
  a[0] = 25;
  a[1] = 26;
  a[2] = 27;
  a[3] = 28;
  char t = foo (a);
  printf("result: %d\n", t);

}

@sjoerdmeijer
Copy link
Collaborator

reading passed the array is definitely a security problem

We are not accessing any out of bound memory in the above example. I am not sure whether this violates any language standard or not. I will look into it.

The fundamental issue is that two different 64 bit addresses may point to the same memory location at least on AArch64. DataLayout::getIndexSize() and DataLayout::getPointerSize() both return 8. I will check the langref further to see if there is anything I miss.

On x86 the situation maybe different. Since the upper bits of address have to be in a canonical form. I will do some more investigation on x86 as well, and then open an issue to discuss the implications.

Regarding the bug, haven't checked vectorization yet, but I have another bug. Basically alias analysis thinks p and p + 1ULL << 58 point to two different memory location but that is not correct. (I have tried this on two different AArch64 chips from two different vendors)

Compile with O0 and O3. The results are different.

#include <stdio.h>

__attribute__((noinline))
char foo (char *a) {

  char *b = a+ (1ULL << 58) ;
  *b = *a + 2;
  return *a;

}

int main () {

  char a[4];
  a[0] = 25;
  a[1] = 26;
  a[2] = 27;
  a[3] = 28;
  char t = foo (a);
  printf("result: %d\n", t);

}

Thanks for the example @amehsan .
Would you mind starting a discourse thread on this topic? I feel we could use some little help with this one, also get the language lawyers involved. :-)

@nikic
Copy link
Contributor

nikic commented Oct 17, 2025

You are conflating the physical machine behavior with the abstract machine model. As far as LLVM is concerned, doing an access at address p+(1<<58) is UB, because you do not have provenance to access that address. The fact that, on the underlying hardware, this address maps to the same memory is irrelevant, in the same way that an out of bounds array access is UB even if another object just so happens to be allocated at that address and the access does not actually trap.

Using TBI in a way that is compatible with LLVM's memory model is actually quite tricky (*), but I don't think this is really relevant to what this PR is doing.

(*) One way to model this is to destroy the old provenance and allocate a fresh one whenever the TBI bits change. This effectively means that from an AM perspective, the object only ever lives at a single location at any given time.

@amehsan
Copy link
Contributor

amehsan commented Oct 18, 2025

As far as LLVM is concerned, doing an access at address p+(1<<58) is UB, because you do not have provenance to access that address.

You are right. I missed that. Adding provenance to the picture I suspect there might be a more permissive algorithm for monotonicity. I need to do some more work on it. Will post an update by Monday.

@amehsan
Copy link
Contributor

amehsan commented Oct 20, 2025

I explain the main idea with an example here. If this is correct and can be made to work, it will be indepndent of this patch and probably something that can be implemented earlier in pipeline. Consider the following loop. Also this part of langref about inbounds attribute is important here:

The successive addition of the current address, truncated to the pointer index type and interpreted as an unsigned number, and each offset, interpreted as a signed number, does not wrap the pointer index type

void foo2 (char *A,  uint64_t   n) {

// for.body:                                         ; preds = %entry, %for.body
//  %j.04 = phi i64 [ 0, %entry ], [ %inc, %for.body ]
//  %conv = trunc i64 %j.04 to i8
//  %arrayidx = getelementptr inbounds nuw i8, ptr %A, i64 %j.04
//  store i8 %conv, ptr %arrayidx, align 1, !tbaa !6
//  %inc = add nuw i64 %j.04, 1
//  %exitcond = icmp eq i64 %inc, %umax
//  br i1 %exitcond, label %for.cond.cleanup, label %for.body, !llvm.loop !9

       for (uint64_t j = 0;  j <= n ; j ++) {
         A[j] = (char) j;
       }
}

The value of %j.04 can be any 64 bit integer. In the context of GEP instruction it should be intepreted as a signed integer that can take values in the range [-2^63, 2^63-1]. I assume we have a 64 bit address space starting from 0 and ending at 2^64 - 1. There are a couple of possibilities for address of A

  1. If address of A is 2^63+X (X is a positive value less than 2^63): Consider j = 2^63 - X. A[j - 1] will address last memory byte, 2^64-1, and A[j] will wrap i64 type, in violation of the inbounds rule above. So to avoid UB, there must be an upper bound of at most 2^63 - 1 for j. (when j is interreted as a signed integer).
  2. We can also make a similar argument and prove a bound for j if address of A is 2^63+X (here X is any negative 64 bit number)
  3. The only remaining case is that address of A is exactly at 2^63. In this case values of A[j] can address any memory location in the entire 64-bit address space. To ensure we have provenance to access any memory location, there must have been a single allocated object of size 2^64. While this may be OK for the abstract machine, this is certainly not the case for any 64 bit target that I am aware of (*). It is reasonable to add a new parameter to e.g. DataLayout (or somewhere else) that is an upper bound on the size of a single allocated object (**). If we have a bound smaller than 2^64 then this last case is also impossible.

Essentially we prove that j cannot wrap. Even though loop conditions allow that, there will UB somewhere else in the loop if j really takes all possible values. Either we violate conditions of inbound or the allocated object should be impossibly large.

(EDIT: Unless I hear something that undermines the whole idea for some reason, we will follow up on this separately like an indepndent feature. RFC, etc.)


(*) AArch64 and x86 machines use less than 64 bits for virtual address as discussed before. A machine that uses e.g. 56 bits for addressing cannot allocate an object of size larger than 2^56 simply because it cannot address it entirely.

(**) I am not sure DataLayout is the best place. The bound could be 2^64 bit by default, but set to a smaller value depending on -mcpu option. Realistic bounds are definitely smaller, but this is provably correct. It is also possible that the user sets this bound for a program being compiled.

Copy link
Contributor

@amehsan amehsan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as mentioned above, I think this patch should be merged independent of the idea that I explained.

@nikic
Copy link
Contributor

nikic commented Oct 20, 2025

@amehsan The size of allocated objects is already limited to at most half the address space, see https://llvm.org/docs/LangRef.html#allocated-objects.

The kind of reasoning you do is indeed valid (*), and something that LAA already does, see:

// An nusw getelementptr that is an AddRec cannot wrap. If it would wrap,
// the distance between the previously accessed location and the wrapped
// location will be larger than half the pointer index type space. In that
// case, the GEP would be poison and any memory access dependent on it would
// be immediate UB when executed.
if (auto *GEP = dyn_cast_if_present<GetElementPtrInst>(Ptr);
GEP && GEP->hasNoUnsignedSignedWrap())
return true;

However, this reasoning critically depends on the assumption that the GEP feeds into a memory access that will be executed. The way this is handled in LAA is not entirely correct, see the fix in #161445. We can't use this reasoning in conditionally executed code.

(*) It may not be valid in multi-provenance models -- but that's mostly an argument against having a multi-provenance model...

@amehsan
Copy link
Contributor

amehsan commented Oct 20, 2025

The size of allocated objects is already limited to at most half the address space,

OK, I am not sure if pushing down this upper bound could help or not.

this reasoning critically depends on the assumption that the GEP feeds into a memory access that will be executed.

Correct.

something that LAA already does

Two comments on this:

From a very quick glance, I suspect there might some bugs in this code. If the offset of GEP is a function F of induction variable such that the minimum and maximum values in range(F) are not too far from each other, then the conclusion may not be correct. Just having nsuw flag is not enough. We need to check possible values that the offset can take. I am not sure if LAA performs this check somewhere or not.

I believe this would be more useful somewhere else, not in LAA or DA. For exmaple if this reasoning can infer nsw/nuw flags for induction variable of a loop, that might have wider benefit than implementing it in LAA or DA. Another candidate could be in SCEV.

@amehsan
Copy link
Contributor

amehsan commented Oct 21, 2025

something that LAA already does

After working on a few examples, I think I need to look more closely into the existing LAA code. Thanks for the information.

@kasuga-fj
Copy link
Contributor Author

Since there don't seem to be any objections, I will go ahead and land the changes. Thanks for reviews!

@kasuga-fj kasuga-fj enabled auto-merge (squash) October 21, 2025 08:41
@kasuga-fj kasuga-fj merged commit ab789be into main Oct 21, 2025
9 of 10 checks passed
@kasuga-fj kasuga-fj deleted the users/kasuga-fj/da-monotonic-check-0 branch October 21, 2025 09:11
@kasuga-fj
Copy link
Contributor Author

@amehsan JFYI: This commit message really helped me understand the nusw-based inference in LAA.

@amehsan
Copy link
Contributor

amehsan commented Oct 21, 2025

@amehsan JFYI: This commit message really helped me understand the nusw-based inference in LAA.

Thanks. I am wondering about cases like this:

void foo2 (char * restrict A,   uint64_t   n, uint64_t mask) {
       for (uint64_t j = 0;  j < n ; j ++) {
         uint16_t k = j;   // or  uint32_t k = j;
         A[k ] = A[k+1] + 4;
       }
}

Where your offset spans a much smaller interval (So we cannot rely on UB from object size). Somehow this is taken care of in vectorization. I believe that happens before we reach this part of LAA code and so here, we don't need to worry about it.

For this case, we check and if n is larger than a threshold (which depends on the size of k) we don't execute vectorized code. If we complicate the operation on j the loop will not be vectorized anymore.

@kasuga-fj
Copy link
Contributor Author

Where your offset spans a much smaller interval (So we cannot rely on UB from object size). Somehow this is taken care of in vectorization. I believe that happens before we reach this part of LAA code and so here, we don't need to worry about it.

Seems correct. In my understanding,

  • Reasoning about nowrap property from nusw is performed only when the offset is an affine addrec.
  • In your example, I think the offset will become something like (zext i16 {0,+,1} to i64), which is not an addrec, so the inference will not be performed.

I guess PredicatedScalarEvolution::getAsAddRec adds some predicates (in this case n < 2^16 or similar one) to safely regards the SCEV as an affine addrec.

@amehsan
Copy link
Contributor

amehsan commented Oct 21, 2025

only when the offset is an affine addrec

Yes, it is probably provably correct for affine addrec and affine addrec should cover most practically interesting cases. We'll look into implementing this logic earlier in the pipeline so it can help DA and potentially other passes too.

ronlieb added a commit to ROCm/llvm-project that referenced this pull request Oct 21, 2025
* [flang] Fix standalone build regression from llvm#161179 (llvm#164309)

Fix incorrect linking and dependencies introduced in llvm#161179 that break
standalone builds of Flang.

Signed-off-by: Michał Górny <[email protected]>

* [AMDGPU] Remove magic constants from V_PK_ADD_F32 pattern. NFC (llvm#164335)

* [AMDGPU] Update code sequence for CU-mode Release Fences in GFX10+ (llvm#161638)

They were previously optimized to not emit any waitcnt, which is
technically correct because there is no reordering of operations at
workgroup scope in CU mode for GFX10+.

This breaks transitivity however, for example if we have the following
sequence of events in one thread:

- some stores
- store atomic release syncscope("workgroup")
- barrier

then another thread follows with

- barrier
- load atomic acquire
- store atomic release syncscope("agent")

It does not work because, while the other thread sees the stores, it
cannot release them at the wider scope. Our release fences aren't strong
enough to "wait" on stores from other waves.

We also cannot strengthen our release fences any further to allow for
releasing other wave's stores because only GFX12 can do that with
`global_wb`. GFX10-11 do not have the writeback instruction.
It'd also add yet another level of complexity to code sequences, with
both acquire/release having CU-mode only alternatives.
Lastly, acq/rel are always used together. The price for synchronization
has to be paid either at the acq, or the rel. Strengthening the releases
would just make the memory model more complex but wouldn't help
performance.

So the choice here is to streamline the code sequences by making CU and
WGP mode emit almost identical (vL0 inv is not needed in CU mode) code
for release (or stronger) atomic ordering.

This also removes the `vm_vsrc(0)` wait before barriers. Now that the
release fence in CU mode is strong enough, it is no longer needed.

Supersedes llvm#160501
Solves SC1-6454

* [InstSimplify] Support ptrtoaddr in simplifyGEPInst() (llvm#164262)

This adds support for ptrtoaddr in the `ptradd p, ptrtoaddr(p2) -
ptrtoaddr(p) -> p2` fold.

This fold requires that p and p2 have the same underlying object
(otherwise the provenance may not be the same).

The argument I would like to make here is that because the underlying
objects are the same (and the pointers in the same address space), the
non-address bits of the pointer must be the same. Looking at some
specific cases of underlying object relationship:

 * phi/select: Trivially true.
* getelementptr: Only modifies address bits, non-address bits must
remain the same.
* addrspacecast round-trip cast: Must preserve all bits because we
optimize such round-trip casts away.
* non-interposable global alias: I'm a bit unsure about this one, but I
guess the alias and the aliasee must have the same non-address bits?
* various intrinsics like launder.invariant.group, ptrmask. I think
these all either preserve all pointer bits (like the invariant.group
ones) or at least the non-address bits (like ptrmask). There are some
interesting cases like amdgcn.make.buffer.rsrc, but those are cross
address-space.

-----

There is a second `gep (gep p, C), (sub 0, ptrtoint(p)) -> C` transform
in this function, which I am not extending to handle ptrtoaddr, adding
negative tests instead. This transform is overall dubious for provenance
reasons, but especially dubious with ptrtoaddr, as then we don't have
the guarantee that provenance of `p` has been exposed.

* [Hexagon] Add REQUIRES: asserts to test

This test uses -debug-only, so needs an assertion-enabled build.

* [AArch64] Combing scalar_to_reg into DUP if the DUP already exists (llvm#160499)

If we already have a dup(x) as part of the DAG along with a
scalar_to_vec(x), we can re-use the result of the dup to the
scalar_to_vec(x).

* [CAS] OnDiskGraphDB - fix MSVC "not all control paths return a value" warnings. NFC. (llvm#164369)

* Reapply "[libc++] Optimize __hash_table::erase(iterator, iterator)" (llvm#162850)

This reapplication fixes the use after free caused by not properly
updating the bucket list in one case.

Original commit message:
Instead of just calling the single element `erase` on every element of
the range, we can combine some of the operations in a custom
implementation. Specifically, we don't need to search for the previous
node or re-link the list every iteration. Removing this unnecessary work
results in some nice performance improvements:
```
-----------------------------------------------------------------------------------------------------------------------
Benchmark                                                                                             old           new
-----------------------------------------------------------------------------------------------------------------------
std::unordered_set<int>::erase(iterator, iterator) (erase half the container)/0                    457 ns        459 ns
std::unordered_set<int>::erase(iterator, iterator) (erase half the container)/32                   995 ns        626 ns
std::unordered_set<int>::erase(iterator, iterator) (erase half the container)/1024               18196 ns       7995 ns
std::unordered_set<int>::erase(iterator, iterator) (erase half the container)/8192              124722 ns      70125 ns
std::unordered_set<std::string>::erase(iterator, iterator) (erase half the container)/0            456 ns        461 ns
std::unordered_set<std::string>::erase(iterator, iterator) (erase half the container)/32          1183 ns        769 ns
std::unordered_set<std::string>::erase(iterator, iterator) (erase half the container)/1024       27827 ns      18614 ns
std::unordered_set<std::string>::erase(iterator, iterator) (erase half the container)/8192      266681 ns     226107 ns
std::unordered_map<int, int>::erase(iterator, iterator) (erase half the container)/0               455 ns        462 ns
std::unordered_map<int, int>::erase(iterator, iterator) (erase half the container)/32              996 ns        659 ns
std::unordered_map<int, int>::erase(iterator, iterator) (erase half the container)/1024          15963 ns       8108 ns
std::unordered_map<int, int>::erase(iterator, iterator) (erase half the container)/8192         136493 ns      71848 ns
std::unordered_multiset<int>::erase(iterator, iterator) (erase half the container)/0               454 ns        455 ns
std::unordered_multiset<int>::erase(iterator, iterator) (erase half the container)/32              985 ns        703 ns
std::unordered_multiset<int>::erase(iterator, iterator) (erase half the container)/1024          16277 ns       9085 ns
std::unordered_multiset<int>::erase(iterator, iterator) (erase half the container)/8192         125736 ns      82710 ns
std::unordered_multimap<int, int>::erase(iterator, iterator) (erase half the container)/0          457 ns        454 ns
std::unordered_multimap<int, int>::erase(iterator, iterator) (erase half the container)/32        1091 ns        646 ns
std::unordered_multimap<int, int>::erase(iterator, iterator) (erase half the container)/1024     17784 ns       7664 ns
std::unordered_multimap<int, int>::erase(iterator, iterator) (erase half the container)/8192    127098 ns      72806 ns
```


This reverts commit acc3a62.

* [TableGen] List the indices of sub-operands (llvm#163723)

Some instances of the `Operand` class used in Tablegen instruction
definitions expand to a cluster of multiple operands at the MC layer,
such as complex addressing modes involving base + offset + shift, or
clusters of operands describing conditional Arm instructions or
predicated MVE instructions. There's currently no convenient way for C++
code to know the offset of one of those sub-operands from the start of
the cluster: instead it just hard-codes magic numbers like `index+2`,
which is hard to read and fragile.

This patch adds an extra piece of output to `InstrInfoEmitter` to define
those instruction offsets, based on the name of the `Operand` class
instance in Tablegen, and the names assigned to the sub-operands in the
`MIOperandInfo` field. For example, if target Foo were to define

  def Bar : Operand {
    let MIOperandInfo = (ops GPR:$first, i32imm:$second);
    // ...
  }

then the new constants would be `Foo::SUBOP_Bar_first` and
`Foo::SUBOP_Bar_second`, defined as 0 and 1 respectively.

As an example, I've converted some magic numbers related to the MVE
predication operand types (`vpred_n` and its superset `vpred_r`) to use
the new named constants in place of the integer literals they previously
used. This is more verbose, but also clearer, because it explains why
the integer is chosen instead of what its value is.

* [lldb] Add bidirectional packetLog to gdbclientutils.py (llvm#162176)

While debugging the tests for llvm#155000 I found it helpful to have both
sides
of the simulated gdb-rsp traffic rather than just the responses so I've
extended
the packetLog in MockGDBServerResponder to record traffic in both
directions.
Tests have been updated accordingly

* [MLIR] [Vector] Added canonicalizer for folding from_elements + transpose (llvm#161841)

## Description
Adds a new canonicalizer that folds
`vector.from_elements(vector.transpose))` => `vector.from_elements`.
This canonicalization reorders the input elements for
`vector.from_elements`, adjusts the output shape to match the effect of
the transpose op and eliminating its need.

## Testing
Added a 2D vector lit test that verifies the working of the rewrite.

---------

Signed-off-by: Keshav Vinayak Jha <[email protected]>

* [DA] Add initial support for monotonicity check (llvm#162280)

The dependence testing functions in DA assume that the analyzed AddRec
does not wrap over the entire iteration space. For AddRecs that may
wrap, DA should conservatively return unknown dependence. However, no
validation is currently performed to ensure that this condition holds,
which can lead to incorrect results in some cases.

This patch introduces the notion of *monotonicity* and a validation
logic to check whether a SCEV is monotonic. The monotonicity check
classifies the SCEV into one of the following categories:

- Unknown: Nothing is known about the monotonicity of the SCEV.
- Invariant: The SCEV is loop-invariant.
- MultivariateSignedMonotonic: The SCEV doesn't wrap in a signed sense
for any iteration of the loops in the loop nest.

The current validation logic basically searches an affine AddRec
recursively and checks whether the `nsw` flag is present. Notably, it is
still unclear whether we should also have a category for unsigned
monotonicity.
The monotonicity check is still under development and disabled by
default for now. Since such a check is necessary to make DA sound, it
should be enabled by default once the functionality is sufficient.

Split off from llvm#154527.

* [VPlan] Use VPlan::getRegion to shorten code (NFC) (llvm#164287)

* [VPlan] Improve code using m_APInt (NFC) (llvm#161683)

* [SystemZ] Avoid trunc(add(X,X)) patterns (llvm#164378)

Replace with trunc(add(X,Y)) to avoid premature folding in upcoming patch llvm#164227

* [clang][CodeGen] Emit `llvm.tbaa.errno` metadata during module creation

Let Clang emit `llvm.tbaa.errno` metadata in order to let LLVM
carry out optimizations around errno-writing libcalls to, as
long as it is proved the involved memory location does not alias
`errno`.

Previous discussion: https://discourse.llvm.org/t/rfc-modelling-errno-memory-effects/82972.

* [LV][NFC] Remove undef from phi incoming values (llvm#163762)

Split off from PR llvm#163525, this standalone patch replaces
 use of undef as incoming PHI values with zero, in order
 to reduce the likelihood of contributors hitting the
 `undef deprecator` warning in github.

* [DA] Add option to enable specific dependence test only (llvm#164245)

PR llvm#157084 added an option `da-run-siv-routines-only` to run only SIV
routines in DA. This PR replaces that option with a more fine-grained
one that allows to select other than SIV routines as well. This option
is useful for regression testing of individual DA routines. This patch
also reorganizes regression tests that use `da-run-siv-routines-only`.

* [libcxx] Optimize `std::generate_n` for segmented iterators (llvm#164266)

Part of llvm#102817.

This is a natural follow-up to llvm#163006. We are forwarding
`std::generate_n` to `std::__for_each_n` (`std::for_each_n` needs
c++17), resulting in improved performance for segmented iterators.

before:

```
std::generate_n(deque<int>)/32          17.5 ns         17.3 ns     40727273
std::generate_n(deque<int>)/50          25.7 ns         25.5 ns     26352941
std::generate_n(deque<int>)/1024         490 ns          487 ns      1445161
std::generate_n(deque<int>)/8192        3908 ns         3924 ns       179200
```

after:

```
std::generate_n(deque<int>)/32          11.1 ns         11.0 ns     64000000
std::generate_n(deque<int>)/50          16.1 ns         16.0 ns     44800000
std::generate_n(deque<int>)/1024         291 ns          292 ns      2357895
std::generate_n(deque<int>)/8192        2269 ns         2250 ns       298667
```

* [BOLT] Check entry point address is not in constant island (llvm#163418)

There are cases where `addEntryPointAtOffset` is called with a given
`Offset` that points to an address within a constant island. This
triggers `assert(!isInConstantIsland(EntryPointAddress)` and causes BOLT
to crash. This patch adds a check which ignores functions that would add
such entry points and warns the user.

* [llvm][dwarfdump] Pretty-print DW_AT_language_version (llvm#164222)

In both verbose and non-verbose mode we will now use the
`llvm::dwarf::LanguageDescription` to turn the version into a human
readable string. In verbose mode we also display the raw version code
(similar to how we display addresses in verbose mode). To make the
version code and prettified easier to distinguish, we print the
prettified name in colour (if available), which is consistent with how
`DW_AT_language` is printed in colour.

Before:
```
0x0000000c: DW_TAG_compile_unit                                                                           
              DW_AT_language_name       (DW_LNAME_C)                                                      
              DW_AT_language_version    (201112)             
```
After:
```
0x0000000c: DW_TAG_compile_unit                                                                           
              DW_AT_language_name       (DW_LNAME_C)                                                      
              DW_AT_language_version    (201112 C11)                                                             
```

---------

Signed-off-by: Michał Górny <[email protected]>
Signed-off-by: Keshav Vinayak Jha <[email protected]>
Co-authored-by: Michał Górny <[email protected]>
Co-authored-by: Stanislav Mekhanoshin <[email protected]>
Co-authored-by: Pierre van Houtryve <[email protected]>
Co-authored-by: Nikita Popov <[email protected]>
Co-authored-by: David Green <[email protected]>
Co-authored-by: Simon Pilgrim <[email protected]>
Co-authored-by: Nikolas Klauser <[email protected]>
Co-authored-by: Simon Tatham <[email protected]>
Co-authored-by: Daniel Sanders <[email protected]>
Co-authored-by: Keshav Vinayak Jha <[email protected]>
Co-authored-by: Ryotaro Kasuga <[email protected]>
Co-authored-by: Ramkumar Ramachandra <[email protected]>
Co-authored-by: Antonio Frighetto <[email protected]>
Co-authored-by: David Sherwood <[email protected]>
Co-authored-by: Connector Switch <[email protected]>
Co-authored-by: Asher Dobrescu <[email protected]>
Co-authored-by: Michael Buch <[email protected]>
Lukacma pushed a commit to Lukacma/llvm-project that referenced this pull request Oct 29, 2025
The dependence testing functions in DA assume that the analyzed AddRec
does not wrap over the entire iteration space. For AddRecs that may
wrap, DA should conservatively return unknown dependence. However, no
validation is currently performed to ensure that this condition holds,
which can lead to incorrect results in some cases.

This patch introduces the notion of *monotonicity* and a validation
logic to check whether a SCEV is monotonic. The monotonicity check
classifies the SCEV into one of the following categories:

- Unknown: Nothing is known about the monotonicity of the SCEV.
- Invariant: The SCEV is loop-invariant.
- MultivariateSignedMonotonic: The SCEV doesn't wrap in a signed sense
for any iteration of the loops in the loop nest.

The current validation logic basically searches an affine AddRec
recursively and checks whether the `nsw` flag is present. Notably, it is
still unclear whether we should also have a category for unsigned
monotonicity.
The monotonicity check is still under development and disabled by
default for now. Since such a check is necessary to make DA sound, it
should be enabled by default once the functionality is sufficient.

Split off from llvm#154527.
aokblast pushed a commit to aokblast/llvm-project that referenced this pull request Oct 30, 2025
The dependence testing functions in DA assume that the analyzed AddRec
does not wrap over the entire iteration space. For AddRecs that may
wrap, DA should conservatively return unknown dependence. However, no
validation is currently performed to ensure that this condition holds,
which can lead to incorrect results in some cases.

This patch introduces the notion of *monotonicity* and a validation
logic to check whether a SCEV is monotonic. The monotonicity check
classifies the SCEV into one of the following categories:

- Unknown: Nothing is known about the monotonicity of the SCEV.
- Invariant: The SCEV is loop-invariant.
- MultivariateSignedMonotonic: The SCEV doesn't wrap in a signed sense
for any iteration of the loops in the loop nest.

The current validation logic basically searches an affine AddRec
recursively and checks whether the `nsw` flag is present. Notably, it is
still unclear whether we should also have a category for unsigned
monotonicity.
The monotonicity check is still under development and disabled by
default for now. Since such a check is necessary to make DA sound, it
should be enabled by default once the functionality is sufficient.

Split off from llvm#154527.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

llvm:analysis Includes value tracking, cost tables and constant folding

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants