[DA] Add initial support for monotonicity check #162280

kasuga-fj · 2025-10-07T13:20:58Z

The dependence testing functions in DA assume that the analyzed AddRec does not wrap over the entire iteration space. For AddRecs that may wrap, DA should conservatively return unknown dependence. However, no validation is currently performed to ensure that this condition holds, which can lead to incorrect results in some cases.

This patch introduces the notion of monotonicity and a validation logic to check whether a SCEV is monotonic. The monotonicity check classifies the SCEV into one of the following categories:

Unknown: Nothing is known about the monotonicity of the SCEV.
Invariant: The SCEV is loop-invariant.
MultivariateSignedMonotonic: The SCEV doesn't wrap in a signed sense for any iteration of the loops in the loop nest.

The current validation logic basically searches an affine AddRec recursively and checks whether the nsw flag is present. Notably, it is still unclear whether we should also have a category for unsigned monotonicity.
The monotonicity check is still under development and disabled by default for now. Since such a check is necessary to make DA sound, it should be enabled by default once the functionality is sufficient.

Split off from #154527.

kasuga-fj · 2025-10-07T13:21:14Z

[DA] Add tests for nsw doesn't hold on entier iteration #162281
[DA] Add initial support for monotonicity check #162280 👈 (View in Graphite)
main

This stack of pull requests is managed by Graphite. Learn more about stacking.

kasuga-fj · 2025-10-08T18:37:07Z

llvm/lib/Analysis/DependenceAnalysis.cpp

+  /// Note that we don't check if the step recurrence can be zero. For
+  /// example,an AddRec `{0,+,%a}<nsw> is classifed as Monotonic if `%a` can be
+  /// zero. That is, the expression can be Invariant.
+  MultiSignedMonotonic,


I think this name is not good. Please let me know if you have a better one. (it would be better if the name also imply that the step value is loop invariant.)

MultivariateMonotonic

For a mathematical (monotonic) function, invariants are just constants that do not appear in the function's domain.

kasuga-fj · 2025-10-08T18:46:01Z

llvm/lib/Analysis/DependenceAnalysis.cpp

+struct SCEVMonotonicityChecker
+    : public SCEVVisitor<SCEVMonotonicityChecker, SCEVMonotonicity> {


As for the testability, maybe is it better to split the file, like ScalarEvolutionDivision.cpp? Or would it be better to avoid creating separate files unnecessarily?

kasuga-fj · 2025-10-09T11:08:06Z

llvm/test/Analysis/DependenceAnalysis/monotonicity-no-wrap-flags.ll

+;     if (cond)
+;       a[i + j] = 0;
+;
+define void @conditional_store0(ptr %a, i64 %n, i64 %m) {


According to #159846 (comment), apparently we cannot infer nsw in this case. I still need to check practical cases, it might be serious and the analysis might be too poor...

just looking at the pseudo C and your comment above: Something like this might be a test case for SCEV, but not for DA. In DA I believe you should just close your eyes and assume any nsw/nuw flag given to you is correct.

(EDIT: of course it is good to have a testcase, that checks when nsw/nuw doesn't exist, we do not prove monotonicity)

As your EDIT says, this is a case for SCEV cannot infer nsw so we cannot prove monotonicity, not one where we don't trust the given nsw flag. The purpose of this test is to share the limitations of the analysis capabilities.

kasuga-fj · 2025-10-09T11:10:00Z

llvm/test/Analysis/DependenceAnalysis/monotonicity-no-wrap-flags.ll

@@ -0,0 +1,459 @@
+; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py UTC_ARGS: --version 6


For now, I've added all the cases I could think of. Please let me know if there's anything else you're interested in.

llvmbot · 2025-10-09T11:13:27Z

@llvm/pr-subscribers-llvm-analysis

Author: Ryotaro Kasuga (kasuga-fj)

Changes

The dependence testing functions in DA assume that the analyzed AddRec does not wrap over the entire iteration space. This means that DA cannot analyze AddRecs that may wrap, and should conservatively return Unknown dependence for such cases. However, no validation is currently performed to ensure that this condition holds, which can lead to incorrect results in some cases.

This patch introduces the notion of monotonicity and a validation logic to check whether an AddRec is monotonic. The monotonicity check classifies the subscript of a memory access into one of the following categories:

Unknown: Nothing is known about the monotonicity of the subscript.
Invariant: The subscript is loop invariant.
MultiSignedMonotonic: The subscript is an AddRec, and it does not wrap in a signed sense for any iteration of the loops in the loop nest.

The current validation logic basically searches an AddRec recursively and checks whether the nsw flag is present. Notably, it is still unclear whether we should also have a category for unsigned monotonicity. The monotonicity check is still under development and disabled by default for now. Since such a check is necessary to make DA sound, it should be enabled by default once the functionality is sufficient.

Split off from #154527.

Patch is 38.32 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/162280.diff

4 Files Affected:

(modified) llvm/lib/Analysis/DependenceAnalysis.cpp (+273-3)
(added) llvm/test/Analysis/DependenceAnalysis/monotonicity-cast.ll (+174)
(added) llvm/test/Analysis/DependenceAnalysis/monotonicity-invariant.ll (+150)
(added) llvm/test/Analysis/DependenceAnalysis/monotonicity-no-wrap-flags.ll (+459)

diff --git a/llvm/lib/Analysis/DependenceAnalysis.cpp b/llvm/lib/Analysis/DependenceAnalysis.cpp
index 1f0da8d1830d3..a3134f8571481 100644
--- a/llvm/lib/Analysis/DependenceAnalysis.cpp
+++ b/llvm/lib/Analysis/DependenceAnalysis.cpp
@@ -128,6 +128,18 @@ static cl::opt<bool> RunSIVRoutinesOnly(
              "The purpose is mainly to exclude the influence of those routines "
              "in regression tests for SIV routines."));
 
+// TODO: This flag is disabled by default because it is still under development.
+// Enable it or delete this flag when the feature is ready.
+static cl::opt<bool> EnableMonotonicityCheck(
+    "da-enable-monotonicity-check", cl::init(false), cl::Hidden,
+    cl::desc("Check if the subscripts are monotonic. If it's not, dependence "
+             "is reported as unknown."));
+
+static cl::opt<bool> DumpMonotonicityReport(
+    "da-dump-monotonicity-report", cl::init(false), cl::Hidden,
+    cl::desc(
+        "When printing analysis, dump the results of monotonicity checks."));
+
 //===----------------------------------------------------------------------===//
 // basics
 
@@ -177,13 +189,189 @@ void DependenceAnalysisWrapperPass::getAnalysisUsage(AnalysisUsage &AU) const {
   AU.addRequiredTransitive<LoopInfoWrapperPass>();
 }
 
+namespace {
+
+/// The type of monotonicity of a SCEV. This property is defined with respect to
+/// the outermost loop that DA is analyzing.
+///
+/// This is designed to classify the behavior of AddRec expressions, and does
+/// not care about other SCEVs. For example, given the two loop-invariant values
+/// `A` and `B`, `A + B` is treated as Invariant even if the addition wraps.
+enum class SCEVMonotonicityType {
+  /// The expression is neither loop-invariant nor monotonic (or we fail to
+  /// prove it).
+  Unknown,
+
+  /// The expression is loop-invariant with respect to the outermost loop.
+  Invariant,
+
+  /// The expression is a (nested) affine AddRec and is monotonically increasing
+  /// or decreasing in a signed sense with respect to each loop. Monotonicity is
+  /// checked independently for each loop, and the expression is classified as
+  /// MultiSignedMonotonic if all AddRecs are nsw. For example, in the following
+  /// loop:
+  ///
+  ///   for (i = 0; i < 100; i++)
+  ///     for (j = 0; j < 100; j++)
+  ///       A[i + j] = ...;
+  ///
+  /// The SCEV for `i + j` is classified as MultiSignedMonotonic. On the other
+  /// hand, in the following loop:
+  ///
+  ///   for (i = 0; i < 100; i++)
+  ///     for (j = 0; j <= (1ULL << 63); j++)
+  ///       A[i + j] = ...;
+  ///
+  /// The SCEV for `i + j` is NOT classified as MultiMonotonic, because the
+  /// AddRec for `j` wraps in a signed sense. We don't consider the "direction"
+  /// of each AddRec. For example, in the following loop:
+  ///
+  ///  for (int i = 0; i < 100; i++)
+  ///    for (int j = 0; j < 100; j++)
+  ///      A[i - j] = ...;
+  ///
+  /// The SCEV for `i - j` is classified as MultiSignedMonotonic, even though it
+  /// contains both increasing and decreasing AddRecs.
+  ///
+  /// Note that we don't check if the step recurrence can be zero. For
+  /// example,an AddRec `{0,+,%a}<nsw> is classifed as Monotonic if `%a` can be
+  /// zero. That is, the expression can be Invariant.
+  MultiSignedMonotonic,
+};
+
+struct SCEVMonotonicity {
+  SCEVMonotonicity(SCEVMonotonicityType Type,
+                   const SCEV *FailurePoint = nullptr);
+
+  SCEVMonotonicityType getType() const { return Type; }
+
+  const SCEV *getFailurePoint() const { return FailurePoint; }
+
+  bool isUnknown() const { return Type == SCEVMonotonicityType::Unknown; }
+
+  void print(raw_ostream &OS, unsigned Depth) const;
+
+private:
+  SCEVMonotonicityType Type;
+
+  /// The subexpression that caused Unknown. Mainly for debugging purpose.
+  const SCEV *FailurePoint;
+};
+
+struct SCEVMonotonicityChecker
+    : public SCEVVisitor<SCEVMonotonicityChecker, SCEVMonotonicity> {
+
+  SCEVMonotonicityChecker(ScalarEvolution *SE) : SE(SE) {}
+
+  /// Check the monotonicity of \p Expr. \p Expr must be integer type. If \p
+  /// OutermostLoop is not null, \p Expr must be defined in \p OutermostLoop or
+  /// one of its nested loops.
+  SCEVMonotonicity checkMonotonicity(const SCEV *Expr,
+                                     const Loop *OutermostLoop);
+
+private:
+  ScalarEvolution *SE;
+
+  /// The outermost loop that DA is analyzing.
+  const Loop *OutermostLoop;
+
+  /// A helper to classify \p Expr as either Invariant or Unknown.
+  SCEVMonotonicity invariantOrUnknown(const SCEV *Expr);
+
+  /// Return true if \p Expr is loop-invariant with respect to the outermost
+  /// loop.
+  bool isLoopInvariant(const SCEV *Expr) const;
+
+  /// A helper to create an Unknown SCEVMonotonicity.
+  SCEVMonotonicity createUnknown(const SCEV *FailurePoint) {
+    return SCEVMonotonicity(SCEVMonotonicityType::Unknown, FailurePoint);
+  }
+
+  SCEVMonotonicity visitAddRecExpr(const SCEVAddRecExpr *Expr);
+
+  SCEVMonotonicity visitConstant(const SCEVConstant *) {
+    return SCEVMonotonicity(SCEVMonotonicityType::Invariant);
+  }
+  SCEVMonotonicity visitVScale(const SCEVVScale *) {
+    return SCEVMonotonicity(SCEVMonotonicityType::Invariant);
+  }
+
+  // TODO: Handle more cases.
+  SCEVMonotonicity visitZeroExtendExpr(const SCEVZeroExtendExpr *Expr) {
+    return invariantOrUnknown(Expr);
+  }
+  SCEVMonotonicity visitSignExtendExpr(const SCEVSignExtendExpr *Expr) {
+    return invariantOrUnknown(Expr);
+  }
+  SCEVMonotonicity visitAddExpr(const SCEVAddExpr *Expr) {
+    return invariantOrUnknown(Expr);
+  }
+  SCEVMonotonicity visitMulExpr(const SCEVMulExpr *Expr) {
+    return invariantOrUnknown(Expr);
+  }
+  SCEVMonotonicity visitPtrToIntExpr(const SCEVPtrToIntExpr *Expr) {
+    return invariantOrUnknown(Expr);
+  }
+  SCEVMonotonicity visitTruncateExpr(const SCEVTruncateExpr *Expr) {
+    return invariantOrUnknown(Expr);
+  }
+  SCEVMonotonicity visitUDivExpr(const SCEVUDivExpr *Expr) {
+    return invariantOrUnknown(Expr);
+  }
+  SCEVMonotonicity visitSMaxExpr(const SCEVSMaxExpr *Expr) {
+    return invariantOrUnknown(Expr);
+  }
+  SCEVMonotonicity visitUMaxExpr(const SCEVUMaxExpr *Expr) {
+    return invariantOrUnknown(Expr);
+  }
+  SCEVMonotonicity visitSMinExpr(const SCEVSMinExpr *Expr) {
+    return invariantOrUnknown(Expr);
+  }
+  SCEVMonotonicity visitUMinExpr(const SCEVUMinExpr *Expr) {
+    return invariantOrUnknown(Expr);
+  }
+  SCEVMonotonicity visitSequentialUMinExpr(const SCEVSequentialUMinExpr *Expr) {
+    return invariantOrUnknown(Expr);
+  }
+  SCEVMonotonicity visitUnknown(const SCEVUnknown *Expr) {
+    return invariantOrUnknown(Expr);
+  }
+  SCEVMonotonicity visitCouldNotCompute(const SCEVCouldNotCompute *Expr) {
+    return invariantOrUnknown(Expr);
+  }
+
+  friend struct SCEVVisitor<SCEVMonotonicityChecker, SCEVMonotonicity>;
+};
+
+} // anonymous namespace
+
 // Used to test the dependence analyzer.
 // Looks through the function, noting instructions that may access memory.
 // Calls depends() on every possible pair and prints out the result.
 // Ignores all other instructions.
 static void dumpExampleDependence(raw_ostream &OS, DependenceInfo *DA,
-                                  ScalarEvolution &SE, bool NormalizeResults) {
+                                  ScalarEvolution &SE, LoopInfo &LI,
+                                  bool NormalizeResults) {
   auto *F = DA->getFunction();
+
+  if (DumpMonotonicityReport) {
+    SCEVMonotonicityChecker Checker(&SE);
+    OS << "Monotonicity check:\n";
+    for (Instruction &Inst : instructions(F)) {
+      if (!isa<LoadInst>(Inst) && !isa<StoreInst>(Inst))
+        continue;
+      Value *Ptr = getLoadStorePointerOperand(&Inst);
+      const Loop *L = LI.getLoopFor(Inst.getParent());
+      const SCEV *PtrSCEV = SE.getSCEVAtScope(Ptr, L);
+      const SCEV *AccessFn = SE.removePointerBase(PtrSCEV);
+      SCEVMonotonicity Mon = Checker.checkMonotonicity(AccessFn, L);
+      OS.indent(2) << "Inst: " << Inst << "\n";
+      OS.indent(4) << "Expr: " << *AccessFn << "\n";
+      Mon.print(OS, 4);
+    }
+    OS << "\n";
+  }
+
   for (inst_iterator SrcI = inst_begin(F), SrcE = inst_end(F); SrcI != SrcE;
        ++SrcI) {
     if (SrcI->mayReadOrWriteMemory()) {
@@ -235,7 +423,8 @@ static void dumpExampleDependence(raw_ostream &OS, DependenceInfo *DA,
 void DependenceAnalysisWrapperPass::print(raw_ostream &OS,
                                           const Module *) const {
   dumpExampleDependence(
-      OS, info.get(), getAnalysis<ScalarEvolutionWrapperPass>().getSE(), false);
+      OS, info.get(), getAnalysis<ScalarEvolutionWrapperPass>().getSE(),
+      getAnalysis<LoopInfoWrapperPass>().getLoopInfo(), false);
 }
 
 PreservedAnalyses
@@ -244,7 +433,7 @@ DependenceAnalysisPrinterPass::run(Function &F, FunctionAnalysisManager &FAM) {
      << "':\n";
   dumpExampleDependence(OS, &FAM.getResult<DependenceAnalysis>(F),
                         FAM.getResult<ScalarEvolutionAnalysis>(F),
-                        NormalizeResults);
+                        FAM.getResult<LoopAnalysis>(F), NormalizeResults);
   return PreservedAnalyses::all();
 }
 
@@ -670,6 +859,70 @@ bool DependenceInfo::intersectConstraints(Constraint *X, const Constraint *Y) {
   return false;
 }
 
+//===----------------------------------------------------------------------===//
+// SCEVMonotonicity
+
+SCEVMonotonicity::SCEVMonotonicity(SCEVMonotonicityType Type,
+                                   const SCEV *FailurePoint)
+    : Type(Type), FailurePoint(FailurePoint) {
+  assert(
+      ((Type == SCEVMonotonicityType::Unknown) == (FailurePoint != nullptr)) &&
+      "FailurePoint must be provided iff Type is Unknown");
+}
+
+void SCEVMonotonicity::print(raw_ostream &OS, unsigned Depth) const {
+  OS.indent(Depth) << "Monotonicity: ";
+  switch (Type) {
+  case SCEVMonotonicityType::Unknown:
+    assert(FailurePoint && "FailurePoint must be provided for Unknown");
+    OS << "Unknown\n";
+    OS.indent(Depth) << "Reason: " << *FailurePoint << "\n";
+    break;
+  case SCEVMonotonicityType::Invariant:
+    OS << "Invariant\n";
+    break;
+  case SCEVMonotonicityType::MultiSignedMonotonic:
+    OS << "MultiSignedMonotonic\n";
+    break;
+  }
+}
+
+bool SCEVMonotonicityChecker::isLoopInvariant(const SCEV *Expr) const {
+  return !OutermostLoop || SE->isLoopInvariant(Expr, OutermostLoop);
+}
+
+SCEVMonotonicity SCEVMonotonicityChecker::invariantOrUnknown(const SCEV *Expr) {
+  if (isLoopInvariant(Expr))
+    return SCEVMonotonicity(SCEVMonotonicityType::Invariant);
+  return createUnknown(Expr);
+}
+
+SCEVMonotonicity
+SCEVMonotonicityChecker::checkMonotonicity(const SCEV *Expr,
+                                           const Loop *OutermostLoop) {
+  assert(Expr->getType()->isIntegerTy() && "Expr must be integer type");
+  this->OutermostLoop = OutermostLoop;
+  return visit(Expr);
+}
+
+SCEVMonotonicity
+SCEVMonotonicityChecker::visitAddRecExpr(const SCEVAddRecExpr *Expr) {
+  if (!Expr->isAffine() || !Expr->hasNoSignedWrap())
+    return createUnknown(Expr);
+
+  const SCEV *Start = Expr->getStart();
+  const SCEV *Step = Expr->getStepRecurrence(*SE);
+
+  SCEVMonotonicity StartMon = visit(Start);
+  if (StartMon.isUnknown())
+    return StartMon;
+
+  if (!isLoopInvariant(Step))
+    return createUnknown(Expr);
+
+  return SCEVMonotonicity(SCEVMonotonicityType::MultiSignedMonotonic);
+}
+
 //===----------------------------------------------------------------------===//
 // DependenceInfo methods
 
@@ -3479,10 +3732,19 @@ bool DependenceInfo::tryDelinearize(Instruction *Src, Instruction *Dst,
   // resize Pair to contain as many pairs of subscripts as the delinearization
   // has found, and then initialize the pairs following the delinearization.
   Pair.resize(Size);
+  SCEVMonotonicityChecker MonChecker(SE);
+  const Loop *OutermostLoop = SrcLoop ? SrcLoop->getOutermostLoop() : nullptr;
   for (int I = 0; I < Size; ++I) {
     Pair[I].Src = SrcSubscripts[I];
     Pair[I].Dst = DstSubscripts[I];
     unifySubscriptType(&Pair[I]);
+
+    if (EnableMonotonicityCheck) {
+      if (MonChecker.checkMonotonicity(Pair[I].Src, OutermostLoop).isUnknown())
+        return false;
+      if (MonChecker.checkMonotonicity(Pair[I].Dst, OutermostLoop).isUnknown())
+        return false;
+    }
   }
 
   return true;
@@ -3815,6 +4077,14 @@ DependenceInfo::depends(Instruction *Src, Instruction *Dst,
   Pair[0].Src = SrcEv;
   Pair[0].Dst = DstEv;
 
+  SCEVMonotonicityChecker MonChecker(SE);
+  const Loop *OutermostLoop = SrcLoop ? SrcLoop->getOutermostLoop() : nullptr;
+  if (EnableMonotonicityCheck)
+    if (MonChecker.checkMonotonicity(Pair[0].Src, OutermostLoop).isUnknown() ||
+        MonChecker.checkMonotonicity(Pair[0].Dst, OutermostLoop).isUnknown())
+      return std::make_unique<Dependence>(Src, Dst,
+                                          SCEVUnionPredicate(Assume, *SE));
+
   if (Delinearize) {
     if (tryDelinearize(Src, Dst, Pair)) {
       LLVM_DEBUG(dbgs() << "    delinearized\n");
diff --git a/llvm/test/Analysis/DependenceAnalysis/monotonicity-cast.ll b/llvm/test/Analysis/DependenceAnalysis/monotonicity-cast.ll
new file mode 100644
index 0000000000000..7a72755bcaf2f
--- /dev/null
+++ b/llvm/test/Analysis/DependenceAnalysis/monotonicity-cast.ll
@@ -0,0 +1,174 @@
+; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py UTC_ARGS: --version 6
+; RUN: opt < %s -disable-output -passes="print<da>" -da-dump-monotonicity-report \
+; RUN:     -da-enable-monotonicity-check 2>&1 | FileCheck %s
+
+; int8_t offset = start;
+; for (int i = 0; i < 100; i++, offset += step)
+;   a[sext(offset)] = 0;
+;
+define void @sext_nsw(ptr %a, i8 %start, i8 %step) {
+; CHECK-LABEL: 'sext_nsw'
+; CHECK-NEXT:  Monotonicity check:
+; CHECK-NEXT:    Inst: store i8 0, ptr %idx, align 1
+; CHECK-NEXT:      Expr: {(sext i8 %start to i64),+,(sext i8 %step to i64)}<nsw><%loop>
+; CHECK-NEXT:      Monotonicity: MultiSignedMonotonic
+; CHECK-EMPTY:
+; CHECK-NEXT:  Src: store i8 0, ptr %idx, align 1 --> Dst: store i8 0, ptr %idx, align 1
+; CHECK-NEXT:    da analyze - none!
+;
+entry:
+  br label %loop
+
+loop:
+  %i = phi i64 [ 0, %entry ], [ %i.inc, %loop ]
+  %offset = phi i8 [ %start, %entry ], [ %offset.next, %loop ]
+  %offset.sext = sext i8 %offset to i64
+  %idx = getelementptr i8, ptr %a, i64 %offset.sext
+  store i8 0, ptr %idx
+  %i.inc = add nsw i64 %i, 1
+  %offset.next = add nsw i8 %offset, %step
+  %exitcond = icmp eq i64 %i.inc, 100
+  br i1 %exitcond, label %exit, label %loop
+
+exit:
+  ret void
+}
+
+; The addition for `%offset.next` can wrap, so we cannot prove monotonicity.
+;
+; int8_t offset = start;
+; for (int i = 0; i < 100; i++, offset += step)
+;   a[sext(offset)] = 0;
+;
+define void @sext_may_wrap(ptr %a, i8 %start, i8 %step) {
+; CHECK-LABEL: 'sext_may_wrap'
+; CHECK-NEXT:  Monotonicity check:
+; CHECK-NEXT:    Inst: store i8 0, ptr %idx, align 1
+; CHECK-NEXT:      Expr: (sext i8 {%start,+,%step}<%loop> to i64)
+; CHECK-NEXT:      Monotonicity: Unknown
+; CHECK-NEXT:      Reason: (sext i8 {%start,+,%step}<%loop> to i64)
+; CHECK-EMPTY:
+; CHECK-NEXT:  Src: store i8 0, ptr %idx, align 1 --> Dst: store i8 0, ptr %idx, align 1
+; CHECK-NEXT:    da analyze - confused!
+;
+entry:
+  br label %loop
+
+loop:
+  %i = phi i64 [ 0, %entry ], [ %i.inc, %loop ]
+  %offset = phi i8 [ %start, %entry ], [ %offset.next, %loop ]
+  %offset.sext = sext i8 %offset to i64
+  %idx = getelementptr i8, ptr %a, i64 %offset.sext
+  store i8 0, ptr %idx
+  %i.inc = add nsw i64 %i, 1
+  %offset.next = add i8 %offset, %step
+  %exitcond = icmp eq i64 %i.inc, 100
+  br i1 %exitcond, label %exit, label %loop
+
+exit:
+  ret void
+}
+
+; for (int8_t i = 0; i < 100; i++)
+;   a[zext(offset)] = 0;
+;
+define void @zext_pos(ptr %a) {
+; CHECK-LABEL: 'zext_pos'
+; CHECK-NEXT:  Monotonicity check:
+; CHECK-NEXT:    Inst: store i8 0, ptr %idx, align 1
+; CHECK-NEXT:      Expr: {0,+,1}<nuw><nsw><%loop>
+; CHECK-NEXT:      Monotonicity: MultiSignedMonotonic
+; CHECK-EMPTY:
+; CHECK-NEXT:  Src: store i8 0, ptr %idx, align 1 --> Dst: store i8 0, ptr %idx, align 1
+; CHECK-NEXT:    da analyze - none!
+;
+entry:
+  br label %loop
+
+loop:
+  %i = phi i8 [ 0, %entry ], [ %i.inc, %loop ]
+  %offset.zext = zext nneg i8 %i to i64
+  %idx = getelementptr i8, ptr %a, i64 %offset.zext
+  store i8 0, ptr %idx
+  %i.inc = add nsw i8 %i, 1
+  %exitcond = icmp eq i8 %i.inc, 100
+  br i1 %exitcond, label %exit, label %loop
+
+exit:
+  ret void
+}
+
+; The zero-extened value of `offset` is no longer monotonic. In fact, the
+; values of `offset` in each iteration are:
+;
+;    iteration |   0 | 1 | 2 | ...
+; -------------|-----|---|---|---------
+;       offset |  -1 | 0 | 1 | ...
+; zext(offset) | 255 | 0 | 1 | ...
+;
+;
+; for (int8_t i = -1; i < 100; i++)
+;   a[zext(offset)] = 0;
+;
+define void @zext_cross_zero(ptr %a) {
+; CHECK-LABEL: 'zext_cross_zero'
+; CHECK-NEXT:  Monotonicity check:
+; CHECK-NEXT:    Inst: store i8 0, ptr %idx, align 1
+; CHECK-NEXT:      Expr: (zext i8 {-1,+,1}<nsw><%loop> to i64)
+; CHECK-NEXT:      Monotonicity: Unknown
+; CHECK-NEXT:      Reason: (zext i8 {-1,+,1}<nsw><%loop> to i64)
+; CHECK-EMPTY:
+; CHECK-NEXT:  Src: store i8 0, ptr %idx, align 1 --> Dst: store i8 0, ptr %idx, align 1
+; CHECK-NEXT:    da analyze - confused!
+;
+entry:
+  br label %loop
+
+loop:
+  %i = phi i8 [ -1, %entry ], [ %i.inc, %loop ]
+  %offset.zext = zext nneg i8 %i to i64
+  %idx = getelementptr i8, ptr %a, i64 %offset.zext
+  store i8 0, ptr %idx
+  %i.inc = add nsw i8 %i, 1
+  %exitcond = icmp eq i8 %i.inc, 100
+  br i1 %exitcond, label %exit, label %loop
+
+exit:
+  ret void
+}
+
+; In principle, we can prove that `zext(offset)` is monotonic since we know
+; that `offset` is non-negative.
+;
+; int8_t offset = 0;
+; for (int i = 0; i < 100; i++, offset += step)
+;   a[zext(offset)] = 0;
+;
+define void @zext_nneg_nsw(ptr %a, i8 %step) {
+; CHECK-LABEL: 'zext_nneg_nsw'
+; CHECK-NEXT:  Monotonicity check:
+; CHECK-NEXT:    Inst: store i8 0, ptr %idx, align 1
+; CHECK-NEXT:      Expr: (zext i8 {0,+,%step}<nsw><%loop> to i64)
+; CHECK-NEXT:      Monotonicity: Unknown
+; CHECK-NEXT:      Reason: (zext i8 {0,+,%step}<nsw><%loop> to i64)
+; CHECK-EMPTY:
+; CHECK-NEXT:  Src: store i8 0, ptr %idx, align 1 --> Dst: store i8 0, ptr %idx, align 1
+; CHECK-NEXT:    da analyze - confused!
+;
+entry:
+  br label %loop
+
+loop:
+  %i = phi i64 [ 0, %entry ], [ %i.inc, %loop ]
+  %offset = phi i8 [ 0, %entry ], [ %offset.next, %loop ]
+  %offset.zext = zext nneg i8 %offset to i64
+  %idx = getelementptr i8, ptr %a, i64 %offset.zext
+  store i8 0, ptr %idx
+  %i.inc = add nsw i64 %i, 1
+  %offset.next = add nsw i8 %offset, %step
+  %exitcond = icmp eq i64 %i.inc, 100
+  br i1 %exitcond, label %exit, label %loop
+
+exit:
+  ret void
+}
diff --git a/llvm/test/Analysis/DependenceAnalysis/monotonicity-invariant.ll b/llvm/test/Analysis/DependenceAnalysis/monotonicity-invariant.ll
new file mode 100644
index 0000000000000..8f45dfa3af5dd
--- /dev/null
+++ b/llvm/test/Analysis/DependenceAnalysis/monotonicity-invariant.ll
@@ -0,0 +1,150 @@
+; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py UTC_ARGS: --version 6
+; RUN: opt < %s -disable-output -passes="print<da>" -da-dump-monotonicity-report \
+; RUN:     -da-enable-monotonicity-check 2>&1 | FileCheck %s
+
+; for (int i = 0; i < n; i++)
+;   a[x] = 0;
+define void @single_loop_invariant(ptr %a, i64 %x, i64 %n) {
+; CHECK-LABEL: 'single_loop_invariant'
+; CHECK-NEXT:  Monotonicity check:
+; CHECK-NEXT:    Inst: store i8 0, ptr %idx, align 1
+; CHECK-NEXT:      Expr: %x
+; CHECK-NEXT:      Monotonicity: Invariant
+; CHECK-EMPTY:
+; CHECK-NEXT:  Src: store i8 0, ptr %idx, align 1 --> Dst: store i8 0, ptr %idx, align 1
+; CHECK-NEXT:    da analyze - consistent output [S]!
+;
+entry:
+  %guard = icmp sgt i64 %n, 0
+  br i1 %guard, label %loop, label %exit
+
+loop:
+  %i = phi i64 [ 0, %entry ], [ %i.inc, %loop ]
+  %idx = getelementptr inbounds i8, ptr %a, i64 %x
+  store i8 0, ptr %idx
+  %i.inc = add nsw i64 %i, 1
+  %exitcond = icmp eq i64 %i.inc, %n
+  br i1 %exitcond, label %exit, label %loop
+
+exit:
+  ret void
+}
+
+; for (int i = 0; i < n; i++)
+;   a[(i % 2 == 0 ? x : y)] = 0;
+define void @single_loop_variant(ptr %a, i64 %x, i64 %y, i64 %n) {
+; CHECK-LABEL: 'single_loop_variant'
+; CHECK-...
[truncated]

amehsan

I will start looking into this tomorrow, and would like to do a proper review of the main algorithm. Please bear with me as I may be interrupted by other stuff. Thanks a lot.

amehsan · 2025-10-13T11:17:15Z

llvm/lib/Analysis/DependenceAnalysis.cpp

+  if (EnableMonotonicityCheck)
+    if (MonChecker.checkMonotonicity(Pair[0].Src, OutermostLoop).isUnknown() ||
+        MonChecker.checkMonotonicity(Pair[0].Dst, OutermostLoop).isUnknown())


I have a basic question about these two tests here: If we have an AddRec with a nsw flag, that means this AddRec doesn't wrap. Why that is not enough and we need to recursively check each component of AddRec?

I guess the flags from SCEV assume all the internal components are fixed and only the top level calculation doesn't overflow? Is that correct?

In that case you may want to have a testcase where the top level AddRec has nsw, but monotonicity fails. I didn't see that in your test, but in other test files we have examples of that. It is helpful to add that.

However the example that I see is this loop (the first test in SameSDLoops.ll)

;; for (long int i = 0; i < 10; i++) { ;; for (long int j = 0; j < 10; j++) { ;; for (long int k = 0; k < 10; k++) { ;; for (long int l = 0; l < 10; l++) ;; A[i][j][k][l] = i; ;; } ;; for (long int k = 1; k < 11; k++) { ;; for (long int l = 0; l < 10; l++) ;; A[i + 4][j + 3][k + 2][l + 1] = l;

It is strange that we cannot prove monotonicity here:

Printing analysis 'Dependence Analysis' for function 'samebd0': Monotonicity check: Inst: store i64 %i.013, ptr %arrayidx12, align 8 Expr: {{{{0,+,8000000}<nuw><nsw><%for.cond1.preheader>,+,80000}<nuw><nsw><%for.cond4.preheader>,+,800}<nuw><nsw><%for.cond7.preheader>,+,8}<nuw><nsw><%for.body9> Monotonicity: MultiSignedMonotonic Inst: store i64 %l17.04, ptr %arrayidx24, align 8 Expr: {{{{32242408,+,8000000}<nuw><nsw><%for.cond1.preheader>,+,80000}<nw><%for.cond4.preheader>,+,800}<nuw><nsw><%for.cond18.preheader>,+,8}<nuw><nsw><%for.body20> Monotonicity: Unknown Reason: {{32242408,+,8000000}<nuw><nsw><%for.cond1.preheader>,+,80000}<nw><%for.cond4.preheader>

I have a basic question about these two tests here: If we have an AddRec with a nsw flag, that means this AddRec doesn't wrap. Why that is not enough and we need to recursively check each component of AddRec?

I guess the flags from SCEV assume all the internal components are fixed and only the top level calculation doesn't overflow? Is that correct?

In my understanding, your guess is correct. I added a test case @outer_loop_may_wrap, which I believe demonstrates the scenario where only the outer addrec is guaranteed not to wrap.

However the example that I see is this loop (the first test in SameSDLoops.ll)

;; for (long int i = 0; i < 10; i++) { ;; for (long int j = 0; j < 10; j++) { ;; for (long int k = 0; k < 10; k++) { ;; for (long int l = 0; l < 10; l++) ;; A[i][j][k][l] = i; ;; } ;; for (long int k = 1; k < 11; k++) { ;; for (long int l = 0; l < 10; l++) ;; A[i + 4][j + 3][k + 2][l + 1] = l;

It is strange that we cannot prove monotonicity here:

Printing analysis 'Dependence Analysis' for function 'samebd0': Monotonicity check: Inst: store i64 %i.013, ptr %arrayidx12, align 8 Expr: {{{{0,+,8000000}<nuw><nsw><%for.cond1.preheader>,+,80000}<nuw><nsw><%for.cond4.preheader>,+,800}<nuw><nsw><%for.cond7.preheader>,+,8}<nuw><nsw><%for.body9> Monotonicity: MultiSignedMonotonic Inst: store i64 %l17.04, ptr %arrayidx24, align 8 Expr: {{{{32242408,+,8000000}<nuw><nsw><%for.cond1.preheader>,+,80000}<nw><%for.cond4.preheader>,+,800}<nuw><nsw><%for.cond18.preheader>,+,8}<nuw><nsw><%for.body20> Monotonicity: Unknown Reason: {{32242408,+,8000000}<nuw><nsw><%for.cond1.preheader>,+,80000}<nw><%for.cond4.preheader>

I don't know much about how nowrap flags are transferred from IR to SCEV, but this appears to be a limitation of SCEV. At a glance, it’s not obvious that the second store A[i + 4][j + 3][k + 2][l + 1] = l is always executed when entering the j-loop. This may be the reason why the nowrap flags for %for.cond4.preheader are not preserved in SCEV.

Anyway, for this specific case, I think we could perform additional cheap analysis similar to range analysis in SCEV, since all values except the induction variables are constants. That said, I'm not planning to include such a feature in this PR.

sjoerdmeijer · 2025-10-14T11:01:35Z

just an fyi: I also started looking into this, but need a bit of time to get up to speed with this.

amehsan · 2025-10-14T17:45:08Z

llvm/lib/Analysis/DependenceAnalysis.cpp

+    if (EnableMonotonicityCheck) {
+      if (MonChecker.checkMonotonicity(Pair[I].Src, OutermostLoop).isUnknown())
+        return false;
+      if (MonChecker.checkMonotonicity(Pair[I].Dst, OutermostLoop).isUnknown())
+        return false;
+    }


Another question here, otherwise LGTM.

If we have mutliple subscripts and all of them are monotonic, how could the other monotonicity check (line 4083-4) fail? We need to answer this to make sure we are not running the test redundantly.

Consider the following case (godbolt):

; char A[][32]; ; for (i = 0; i < 1ll << 62; i++) ; for (j = 0; j < 32; j++) ; if (i < (1ll << 57)) ; A[i][j] = 0; define void @outer_loop_may_wrap(ptr %a) { entry: br label %loop.i.header loop.i.header: %i = phi i64 [ 0, %entry ], [ %i.inc, %loop.i.latch ] br label %loop.j.header loop.j.header: %j = phi i64 [ 0, %loop.i.header ], [ %j.inc, %loop.j.latch ] %cond = icmp slt i64 %i, 144115188075855872 ; 2^57 br i1 %cond, label %if.then, label %loop.j.latch if.then: %gep = getelementptr inbounds [32 x i8], ptr %a, i64 %i, i64 %j store i8 0, ptr %gep br label %loop.j.latch loop.j.latch: %j.inc = add nuw nsw i64 %j, 1 %ec.j = icmp eq i64 %j.inc, 32 br i1 %ec.j, label %loop.i.latch, label %loop.j.header loop.i.latch: %i.inc = add nuw nsw i64 %i, 1 %ec.i = icmp eq i64 %i.inc, 4611686018427387904 ; 2^62 br i1 %ec.i, label %exit, label %loop.i.header exit: ret void }

The subscripts {0,+,1}<nuw><nsw><%loop.i.header> and {0,+,1}<nuw><nsw><%loop.j.header> are monotonic, but the original offset {{0,+,32}<%loop.i.header>,+,1}<nw><%loop.j.header> is not.

Added the above test case.

sjoerdmeijer · 2025-10-15T12:13:47Z

I went through other and older merge request and issues to see how we got here, and I think I am now mostly up to speed. Just wanted to share some high level observations first before I look more into the details.

I think it all started with #148435 and the observation that and an access with i%2 results in SCEV i1 {false,+,true}<%loop> that toggles between values 0 and 1. I.e, it iterates through all points in its space, then wraps around. So, just for clarity and completeness, we are thus not necessarily talking about signed/unsigned wrapping behaviour from say the C/C++ language point of view, but just the SCEV wrapping behaviour to capture an access pattern like toggling between values 0 and 1 (and of course other similar patterns). I believe the sign extension of this i1 SCEV to i64 is problematic too, but maybe that is separate?

The idea of monoticity to capture the behaviour that we are not iterating through the whole iteration space again and again makes perfect sense. At this point, the following is also unclear to me:

Notably, it is still unclear whether we should also have a category for unsigned monotonicity.

I.e., I do not see how unsigned monotonicity is going to be different, but I need to think a bit more about this. Related to this, I am also not in love with the name MultiSignedMonotonic, but I see how "Signed" was suggested in the earlier review, and I see how the different components capture the different aspects here. If we don't need to distinguish between signed/unsigned we could drop Signed but guess this is to be determined. At a minimum, at this point, I would like to see a more crisp definition of the 3 components Multi, Signed, and Monotonicity. It's kind of there, but spread out, and I would like to see it all captured just before the point where MultiSignedMonotonic is defined. Maybe we can just drop Multi if we define this property to hold for all SCEVs? I know this is a little bit bikeshedding, but if these terms are going to stick, it is worth discussing a bit. :-)

Having looked into this better now, I actually do agree that it is a bit questionable whether this belongs in SCEV or here in DA. I think we are describing properties of a SCEV, so SCEV is the first thing that would come to my mind, but I am also okay with continuing here and to develop this in-tree while it is off by default. It looks all self-contained and easy to move if we wanted to do this later.

Going to look a bit deeper into this now, but wanted to leave this nit while I am doing that: I liked your example in #148435 (comment) to understand this. If I am not mistaken, you didn't include it here, but may be interesting to add to your tests.

kasuga-fj · 2025-10-15T13:26:50Z

Going to look a bit deeper into this now, but wanted to leave this nit while I am doing that: I liked your example in #148435 (comment) to understand this. If I am not mistaken, you didn't include it here, but may be interesting to add to your tests.

I just added that case. I'll follow up with more detailed comments shortly.

Meinersbur · 2025-10-15T13:04:35Z

llvm/lib/Analysis/DependenceAnalysis.cpp

+  /// The expression is a (nested) affine AddRec and is monotonically increasing
+  /// or decreasing in a signed sense with respect to each loop. Monotonicity is
+  /// checked independently for each loop, and the expression is classified as
+  /// MultiSignedMonotonic if all AddRecs are nsw. For example, in the following


When I brought up monotonicity, I did not mean to apply it to only AddRec expression (which already has a FlagNW property), but to SCEVs in general. An expression could overflow, while none of its operands does. For instance

for (int i = 0; i < INT_MAX; ++i) A[i + 2];

Additions usually get folded into the SCEVAddRecExpr, but others are not, such as UDiv, URem, UMax, UMin, ... . That is

for (int i = 0; i < INT_MAX; ++i) A[i/2];

is monotonic

It is fine if we want to only handle outermost-AddRecs at first, but the comments implies this is about AddRec expressions only. The goal was to ensure that the closed range [Expr(first-iteration), Expr(last-iteration)] (or [Expr(last-iteration), Expr(first-iteration)]) describes the entire range of the expression, i.e. there is no i for which Expr(i) that falls outside that range. The name "monotonic" came from because last-iteration could be any iteration (SCEV does not know when the loop will actually terminate), and the range therefore be the range of values Expr could evaluate so far, leading to a (not necessarily strictly) (increasing or decreasing) monotonic function.

For multiple iteration variables, the range could be [Expr(i_first, j_first) .. Expr(i_last, j_last)] (or [Expr(i_last, j_last) .. Expr(i_first, j_first)]), or the combinatorial

ExtremeValueList = {Expr(i_first, j_first), Expr(i_first, j_last), Expr(i_last, j_first), Expr(i_last, j_last)}; Range := [min(ExtremeValueList), max(ExtremeValueList)]`.

According to the definition of MultiSignedMonotonic, it would be the latter. In my reading of https://en.wikipedia.org/wiki/Monotonic_function, functional analysis would use the first, topology the lastter definition.

If on the other side you understand SCEVMonotonicityType as the same as FlagNSW, but taking for taking all loops of the nest into account, not just the one store in the SCEVAddRecExpr), I would suggest to not call that property "monotonicity".

At first, I was thinking of handling various cases like as you mentioned, but now I think it's fine to focus on AddRecs for the time being. I don't have a strong preference, so changing the name seems reasonable to me (it's also a bit questionable that Invariant is included in the SCEVMonotonicityType in the first place, as you said in #162280 (comment)).

I'm thinking of renaming it something like MultivariateWrapType. Since I'm not good at naming, I'm happy if you have a better idea.

As for the definition, the latter one seems to match my intention.

ExtremeValueList := {Expr(i_first, j_first), Expr(i_first, j_last), Expr(i_last, j_first), Expr(i_last, j_last)}; Range := [min(ExtremeValueList), max(ExtremeValueList)] IsMonotonic(Expr) := Expr(i, j) is in Range for all i in [i_first, i_last] and j in [j_first, j_last]

(since I slacked off on studying, I don't really understand about topology...)

Meinersbur · 2025-10-15T13:27:13Z

llvm/lib/Analysis/DependenceAnalysis.cpp

+  /// Note that we don't check if the step recurrence can be zero. For
+  /// example,an AddRec `{0,+,%a}<nsw> is classifed as Monotonic if `%a` can be
+  /// zero. That is, the expression can be Invariant.
+  MultiSignedMonotonic,


MultivariateMonotonic

For a mathematical (monotonic) function, invariants are just constants that do not appear in the function's domain.

amehsan · 2025-10-15T14:42:52Z

Notably, it is still unclear whether we should also have a category for unsigned monotonicity

In the loop below

 for (unsigned long long i = (1ULL << 62)+ ((1ULL << 62) - 1) ; i < (1ULL <<63) + 3000; i++) {
    A[i] = 7;
}

; Function Attrs: nofree norecurse nosync nounwind memory(argmem: write) uwtable
define dso_local void @foo(ptr noundef writeonly captures(none) %A, ptr noundef readnone captures(none) %B, i64 noundef %n) local_unnamed_addr #0 {
entry:
  br label %for.body

for.cond.cleanup:                                 ; preds = %for.body
  ret void

for.body:                                         ; preds = %entry, %for.body
  %i.03 = phi i64 [ 9223372036854775807, %entry ], [ %inc, %for.body ]
  %arrayidx = getelementptr inbounds nuw i8, ptr %A, i64 %i.03
  store i8 7, ptr %arrayidx, align 1, !tbaa !6
  %inc = add nuw i64 %i.03, 1
  %exitcond.not = icmp eq i64 %inc, -9223372036854772808
  br i1 %exitcond.not, label %for.cond.cleanup, label %for.body, !llvm.loop !9
}

The monotonocity check fails

Monotonicity check:
  Inst:   store i8 7, ptr %arrayidx, align 1, !tbaa !6
    Expr: {9223372036854775807,+,1}<nuw><%for.body>
    Monotonicity: Unknown
    Reason: {9223372036854775807,+,1}<nuw><%for.body>

Src:  store i8 7, ptr %arrayidx, align 1, !tbaa !6 --> Dst:  store i8 7, ptr %arrayidx, align 1, !tbaa !6
  da analyze - confused!

This is not a correctness issue, and I am not sure if there will be any realistic performance issue caused by this or not. But I thought to bring it up.

Talking about performance impact of monotonicity check, the example that I gave before is more concerning:

  for (long int i = 0; i < 10; i++) {
    for (long int j = 0; j < 10; j++) {
      for (long int k = 0; k < 10; k++) {
        for (long int l = 0; l < 10; l++)
          A[i][j][k][l] = i;
      }
      for (long int k = 1; k < 11; k++) {
        for (long int l = 0; l < 10; l++)
          A[i + 4][j + 3][k + 2][l + 1] = l;

Printing analysis 'Dependence Analysis' for function 'samebd0':
Monotonicity check:
  Inst:   store i64 %i.013, ptr %arrayidx12, align 8
    Expr: {{{{0,+,8000000}<nuw><nsw><%for.cond1.preheader>,+,80000}<nuw><nsw><%for.cond4.preheader>,+,800}<nuw><nsw><%for.cond7.preheader>,+,8}<nuw><nsw><%for.body9>
    Monotonicity: MultiSignedMonotonic
  Inst:   store i64 %l17.04, ptr %arrayidx24, align 8
    Expr: {{{{32242408,+,8000000}<nuw><nsw><%for.cond1.preheader>,+,80000}<nw><%for.cond4.preheader>,+,800}<nuw><nsw><%for.cond18.preheader>,+,8}<nuw><nsw><%for.body20>
    Monotonicity: Unknown
    Reason: {{32242408,+,8000000}<nuw><nsw><%for.cond1.preheader>,+,80000}<nw><%for.cond4.preheader>

loops like this are common in fortran programs. We probably need to look into this and if the missing flag is not a bug then the question is how we can safely check dependency for this?

I don't think this issue prevents us from merging this patch, since this is currently the best solution that we know to address some correctness issues. But anyways, this should be investigated. I think we can do at least an initial investigation on our end.

kasuga-fj · 2025-10-15T16:25:46Z

I think it all started with #148435 and the observation that and an access with i%2 results in SCEV i1 {false,+,true}<%loop> that toggles between values 0 and 1. I.e, it iterates through all points in its space, then wraps around. So, just for clarity and completeness, we are thus not necessarily talking about signed/unsigned wrapping behaviour from say the C/C++ language point of view, but just the SCEV wrapping behaviour to capture an access pattern like toggling between values 0 and 1 (and of course other similar patterns). I believe the sign extension of this i1 SCEV to i64 is problematic too, but maybe that is separate?

Yeah, we are talking about the SCEV wrapping behavior here, not C/C++ (or other languages) specific behavior. As for the sign extension, I'm not entirely sure if it's problematic, but it's certain that there are some issues in DA with handling of sign/zero extensions. I think at least DependenceInfo::removeMatchingExtensions needs to be fixed.

The idea of monoticity to capture the behaviour that we are not iterating through the whole iteration space again and again makes perfect sense. At this point, the following is also unclear to me:

Notably, it is still unclear whether we should also have a category for unsigned monotonicity.

I.e., I do not see how unsigned monotonicity is going to be different, but I need to think a bit more about this. Related to this, I am also not in love with the name MultiSignedMonotonic, but I see how "Signed" was suggested in the earlier review, and I see how the different components capture the different aspects here. If we don't need to distinguish between signed/unsigned we could drop Signed but guess this is to be determined. At a minimum, at this point, I would like to see a more crisp definition of the 3 components Multi, Signed, and Monotonicity. It's kind of there, but spread out, and I would like to see it all captured just before the point where MultiSignedMonotonic is defined. Maybe we can just drop Multi if we define this property to hold for all SCEVs? I know this is a little bit bikeshedding, but if these terms are going to stick, it is worth discussing a bit. :-)

The fundamental problem is a bit more serious. We need to pay attention to wrappings, even if a value doesn't iterate through the entire space. For instance, symbolicRDIVtest assumes that a*i + c takes its minimum value when i = 0 and its maximum value when i = N-1 (N-1 is the BTC) when a > 0. This assumption is valid only when a*i + c doesn't wrap. For example, consider the following conditions:

a = 2^61
c = 0
N = 6
The bitwidth of the integer type is 64
All integers are interpreted as signed

Then the minimum value of a*i + c is -2^63 (when i = 4) and the maximum value is 2^62 + 2^61 (when i = 3) so that symbolicRDIVtest can produce an incorrect result. I think this also answers your question about the necessity of distinguishing between signed and unsigned. If the integers are interpreted as unsigned, the minimum value is 0 (when i = 0) and the maximum value is 2^63 + 2^61 (when i = 5). So in this case using unsigned interpretation might be better, but it would make difficult to handle negative values.

As for the naming, I believe I wrote the intention of the name MultiSignedMonotonic in the comment of the code. I'm not confident it’s written clearly, so I’d appreciate any feedback.

kasuga-fj · 2025-10-15T17:27:00Z

Notably, it is still unclear whether we should also have a category for unsigned monotonicity

In the loop below [...] This is not a correctness issue, and I am not sure if there will be any realistic performance issue caused by this or not. But I thought to bring it up.

It is one of the largest issue in the current implementation. A simpler case like below cannot be handled well, since N can be larger than INT64_MAX.

for (size_t i = 0; i < N; i++)
  A[i] = 0;

Talking about performance impact of monotonicity check, the example that I gave before is more concerning:
  for (long int i = 0; i < 10; i++) {
    for (long int j = 0; j < 10; j++) {
      for (long int k = 0; k < 10; k++) {
        for (long int l = 0; l < 10; l++)
          A[i][j][k][l] = i;
      }
      for (long int k = 1; k < 11; k++) {
        for (long int l = 0; l < 10; l++)
          A[i + 4][j + 3][k + 2][l + 1] = l;

Printing analysis 'Dependence Analysis' for function 'samebd0':
Monotonicity check:
  Inst:   store i64 %i.013, ptr %arrayidx12, align 8
    Expr: {{{{0,+,8000000}<nuw><nsw><%for.cond1.preheader>,+,80000}<nuw><nsw><%for.cond4.preheader>,+,800}<nuw><nsw><%for.cond7.preheader>,+,8}<nuw><nsw><%for.body9>
    Monotonicity: MultiSignedMonotonic
  Inst:   store i64 %l17.04, ptr %arrayidx24, align 8
    Expr: {{{{32242408,+,8000000}<nuw><nsw><%for.cond1.preheader>,+,80000}<nw><%for.cond4.preheader>,+,800}<nuw><nsw><%for.cond18.preheader>,+,8}<nuw><nsw><%for.body20>
    Monotonicity: Unknown
    Reason: {{32242408,+,8000000}<nuw><nsw><%for.cond1.preheader>,+,80000}<nw><%for.cond4.preheader>
loops like this are common in fortran programs. We probably need to look into this and if the missing flag is not a bug then the question is how we can safely check dependency for this?

I don't think this issue prevents us from merging this patch, since this is currently the best solution that we know to address some correctness issues. But anyways, this should be investigated. I think we can do at least an initial investigation on our end.

I haven’t looked into the details of how nowrap flags are inferred in SCEV, but I guess SCEV fails to prove that the second store is executed unconditionally¹. In fact, inserting a conditional branch before the first store (like this) also causes the nowrap flag to be dropped from the SCEV for the first store and the monotonicity check fails.

Regardless of the cause, I think it should be handled on the DA side. In this specific case, all the variables involved, except for the induction variables, are constants, so I believe it shouldn’t be too difficult to prove that they don’t wrap. Just holding a possible minimum/maximum value while checking nsw flag recursively should be enough.

Maybe SCEV intentionally didn’t implement it because of compile-time impact. ↩

amehsan · 2025-10-15T18:12:01Z

I guess SCEV fails to prove that the second store is executed unconditionally

Even for a much simpler IR we have the same problem:

; Function Attrs: nofree norecurse nosync nounwind memory(argmem: write) uwtable
define dso_local void @foo(ptr noundef writeonly captures(none) %A) local_unnamed_addr #0 {
entry:
  br label %for.body

for.cond.cleanup:                                 ; preds = %for.body
  ret void

for.body:                                         ; preds = %entry, %for.body
  %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
  %conv = trunc i64 %indvars.iv to i8
  %arrayidx = getelementptr inbounds nuw [20 x i8], ptr %A, i64 %indvars.iv
  %arrayidx1 = getelementptr inbounds nuw i8, ptr %arrayidx, i64 15
  store i8 %conv, ptr %arrayidx1, align 1, !tbaa !6
  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
  %exitcond.not = icmp eq i64 %indvars.iv.next, 10
  br i1 %exitcond.not, label %for.cond.cleanup, label %for.body, !llvm.loop !9
}

Source code

void foo (char A[][20]) {
  for (int i = 0; i < 10; i++) {
          A[i][15] = i;
  }
}

Report:

Printing analysis 'Dependence Analysis' for function 'foo':
Monotonicity check:
  Inst:   store i8 %conv, ptr %arrayidx1, align 1, !tbaa !6
    Expr: {15,+,20}<%for.body>
    Monotonicity: Unknown
    Reason: {15,+,20}<%for.body>

Src:  store i8 %conv, ptr %arrayidx1, align 1, !tbaa !6 --> Dst:  store i8 %conv, ptr %arrayidx1, align 1, !tbaa !6
  da analyze - confused!

Command line:

opt  -disable-output "-passes=print<da>" -aa-pipeline=basic-aa   -da-enable-monotonicity-check -da-dump-monotonicity-report

I don't know how the IR of the testcase in SameSDloop.ll is generated and why the first loop there is OK.

I suspect if we start solving this kind of problem in DA, we will eventually reinvent many SCEV wheels in DA.

kasuga-fj · 2025-10-15T18:49:08Z

It's strange to me that getelementptr appears twice. I tried replacing it with a single getelementptr instruction like getelementptr inbounds nuw [20 x i8], ptr %A, i64 %indvars.iv, i64 15 and then nuw/nsw flags were attached as expected.

I suspect if we start solving this kind of problem in DA, we will eventually reinvent many SCEV wheels in DA.

By limiting the scope to cases where the BTC and the operands of the addrecs are constants, I believe it shouldn’t be very complex. If we want to make (potentially significant) changes to SCEV for the sake of DA to support more complex cases, then I think we should first get DA into the default pipeline and demonstrate its usefulness...

amehsan

On some 64-bit platforms, the higher bits of address are ignored. Sometimes it is even possible to use that to store additional data (see TBI feature in AArch64). I believe in x86 we have a similar situation but the top bits need to follow a canonical form. Doesn't this allow us to assume a correct program won't have overflow in address calculation? (EDIT: Same thing for subscript. They cannot be arbitrary 64 bit values. Effective bit-width should be smaller, and I suspect that may allow us to ignore any concern about AddRec value wrapping).

Meinersbur · 2025-10-16T09:53:07Z

llvm/lib/Analysis/DependenceAnalysis.cpp

+/// The type of monotonicity of a SCEV. This property is defined with respect to
+/// the outermost loop that DA is analyzing.
+///
+/// This is designed to classify the behavior of AddRec expressions, and does
+/// not care about other SCEVs. For example, given the two loop-invariant values
+/// `A` and `B`, `A + B` is treated as Invariant even if the addition wraps.


Using Monotonic is fine if the documentation/definition reflects what it is meant to be, even if it only implements it for AddRec atm. That would give a clear path how it cold to be extended.

This property is defined with respect to the outermost loop that DA is analyzing.

Could be understood as FlagNSW of the outermost loop only, but I think you mean wrapping of any nested loop.

This is designed to classify the behavior of AddRec expressions, and does not care about other SCEVs.

The current doxygen for SCEVMonotonicityType says it it only for AddRecs, and mixes monotonicity and wrapping (I think we consider monotonicity to imply no-wrapping, so if a wrapping AddRec is found it cannot be monotonic, but the other way around may not be true, e.g. with a non-affine SCEVAddRecExpr). Only caring about AddRecs seems arbitrary. Why? What is the property we want to ensure? Could you use a clearer definition?

For example, given the two loop-invariant values A and B, A + B is treated as Invariant even if the addition wraps.

I think this is easier to explain with monotonicity, which is always relative to a variates, the loop induction variables in this case. A and B are just constants (so A + B also evaluates to just a constant, even if it is the result of an overflow, and could have been hoisted out of the loop), i.e. the function over which we defined monotonicity is $f_{A,B}(i)$, where $i$ is the variate.

Yeah, it's clearly mixing up the definition and the implementation. The definition itself doesn't need to be limited to AddRecs. I'll rewrite the definition.

I think we consider monotonicity to imply no-wrapping, so if a wrapping AddRec is found it cannot be monotonic

Correct, that's what I tried to describe.

For example, given the two loop-invariant values A and B, A + B is treated as Invariant even if the addition wraps.

I think this is easier to explain with monotonicity, which is always relative to a variates, the loop induction variables in this case. A and B are just constants (so A + B also evaluates to just a constant, even if it is the result of an overflow, and could have been hoisted out of the loop), i.e. the function over which we defined monotonicity is f A , B ( i ) , where i is the variate.

Do you mean that monotonicity is not defined for constants in the first place?

Revised the definition of monotonicity. I believe it's better than before...

Do you mean that monotonicity is not defined for constants in the first place?

Constants, in the sense of a parametric function, are not the subject of the monotonicity property, only the function argument/variate is. It is the question "when modifying x, how does $f_c(x)$ behave". The answer may depend on $c$, such as $f_c(x)=c*x$ (strictly monotonically decreasing for $c < 0$, monotonic for $c > 0$, strictly monotonically increasing for $c > 0$). Like in case of a derivative, we ask for the slope of $f_c(x)$ at some position $x$. The answer may depend on c ($f'_c(x)=c$), but we are not asking for the slope of infinitismally close $c$ and $c'$.

For loop nests, $c$ is typically the increament of a loop/stride of an array subscript and therefore usually a literal constant, i.e. known $c$. If $c$ is not known at compile-time, the query of monotonicity must either include "for which value of c?", or answer with the most pessimistic result for any $c$, possibly Unkown, i.e. using a forall quantifier. For the $f_c(x)=c*x$ example above, without any wrapping behavior, the answer can be monotonic, as long as we do not care whether it is strict/increasing/decreasing.

For DA, it just means within the scope of the anlysis, $c$ will always have the same value, the behavior of the same LLVM function called with a different value of $c'$ when is irrelvant. DA does not give you dependency information accross function calls (respectively: different values of an invariant variable), only between loop iterations within the same function call. If a and b are invariant/constant, a + b is just some value, no matter how it was computed. It could also be an unknown function g(a,b), as long as we know that the result of g only depends on its arguments.

Thanks, I think I got it. It seems that I was mixed parameters and arguments for a parametric function.

Meinersbur

LGTM, please still consider other's remarks

Meinersbur · 2025-10-17T11:01:57Z

llvm/lib/Analysis/DependenceAnalysis.cpp

+  /// sense. Note that the multimonotonic function may also be a constant
+  /// function. The order employed in the definition of monotonicity is not
+  /// strict order.
+  MultivariateSignedMonotonic,


Used "multimonotonic" in the description, but "Multivariate monotonic" here. Consistency?

Unified to "multivariate monotonic".

Meinersbur · 2025-10-17T11:46:42Z

llvm/lib/Analysis/DependenceAnalysis.cpp

+/// The type of monotonicity of a SCEV. This property is defined with respect to
+/// the outermost loop that DA is analyzing.
+///
+/// This is designed to classify the behavior of AddRec expressions, and does
+/// not care about other SCEVs. For example, given the two loop-invariant values
+/// `A` and `B`, `A + B` is treated as Invariant even if the addition wraps.


Do you mean that monotonicity is not defined for constants in the first place?

Constants, in the sense of a parametric function, are not the subject of the monotonicity property, only the function argument/variate is. It is the question "when modifying x, how does $f_c(x)$ behave". The answer may depend on $c$, such as $f_c(x)=c*x$ (strictly monotonically decreasing for $c < 0$, monotonic for $c > 0$, strictly monotonically increasing for $c > 0$). Like in case of a derivative, we ask for the slope of $f_c(x)$ at some position $x$. The answer may depend on c ($f'_c(x)=c$), but we are not asking for the slope of infinitismally close $c$ and $c'$.

For loop nests, $c$ is typically the increament of a loop/stride of an array subscript and therefore usually a literal constant, i.e. known $c$. If $c$ is not known at compile-time, the query of monotonicity must either include "for which value of c?", or answer with the most pessimistic result for any $c$, possibly Unkown, i.e. using a forall quantifier. For the $f_c(x)=c*x$ example above, without any wrapping behavior, the answer can be monotonic, as long as we do not care whether it is strict/increasing/decreasing.

For DA, it just means within the scope of the anlysis, $c$ will always have the same value, the behavior of the same LLVM function called with a different value of $c'$ when is irrelvant. DA does not give you dependency information accross function calls (respectively: different values of an invariant variable), only between loop iterations within the same function call. If a and b are invariant/constant, a + b is just some value, no matter how it was computed. It could also be an unknown function g(a,b), as long as we know that the result of g only depends on its arguments.

amehsan · 2025-10-17T13:53:55Z

However, it's incorrect to assume that the subscript 2 * i never overflows across the entire iteration space.

My mistake. I think this makes the situation worse, not better. For a platform that ignores upper 8 bits of a 64-bit pointer, even in a simple loop like this

 for (long long i = 0 ; i < n; i++)
    A[i] = 0;

Every time the lower 56 bits of i wrap (and for a 64 bit value a 56 bit signed wrap happens 2^9 times), the subscripts wraps. This patch can detect 64 bit wraps, but that is a small fraction of all possible 56-bit wraps. for example see the following code:

    size_t size =2000;
    unsigned long long T = (1ULL << 58) - 1ULL ;
    char *p = malloc(size);
    if (p) {
        p[1] = 98;
        printf("ptr: T = %llx\n", T);
        printf("ptr: %p\n", &p[1]);
        printf("ptr: %p\n", &p[T+2]);
        printf("val: %d\n", p[1]);
        printf("val: %d\n", p[T+2]);
        free(p);
    }

output:

T = 3ffffffffffffff
ptr: 0xd1892a1
ptr: 0x40000000d1892a1
val: 98
val: 98

I don't have objections to merging this patch. This works well if effective pointer size is 32 or 64. But I think this issue needs to be discussed further, independent of this patch (for example, I suspect we can write test cases that are vectorized but they are not legal, and it is question for me how important is to fix all these unlikely corner cases -- EDIT: in particular could this kind of bug be a security issue?).

sjoerdmeijer · 2025-10-17T13:20:36Z

llvm/lib/Analysis/DependenceAnalysis.cpp

+///   F(i_1, ..., i_{k-1}, x, i_{k+1}, ..., i_N) <=
+///   F(i_1, ..., i_{k-1}, y, i_{k+1}, ..., i_N)
+///
+/// where i_1, ..., i_{k-1}, i_{k+1}, ..., i_N, x, y in their domains.


Feels like a verb is missing in this sentence, maybe something like:

Suggested change

/// where i_1, ..., i_{k-1}, i_{k+1}, ..., i_N, x, y in their domains.

/// where i_1, ..., i_{k-1}, i_{k+1}, ..., i_N, x, and y are elements of their

/// respective domains.

Thanks, fixed.

sjoerdmeijer · 2025-10-17T13:23:20Z

llvm/lib/Analysis/DependenceAnalysis.cpp

+///   F(i_1, ..., i_{k-1}, x, i_{k+1}, ..., i_N) >=
+///   F(i_1, ..., i_{k-1}, y, i_{k+1}, ..., i_N)
+///
+/// A function F with either monotonically increasing or decreasing with


Suggested change

/// A function F with either monotonically increasing or decreasing with

/// A function F that is monotonically increasing or decreasing with

Fixed.

Excuse: I mimicked this phrase from wikipedia.

A function with either property is called strictly monotone.

Probably I need to learn English harder...

sjoerdmeijer · 2025-10-17T13:25:06Z

llvm/lib/Analysis/DependenceAnalysis.cpp

+/// "monotonic with respect to k-th loop".
+///
+/// A function F is said to be "multimonotonic" when it is monotonic with
+/// respect to all of the N loops.


This is for my own understanding, but guess it could be generally beneficial, can you elaborate here why multimonotonic is important? E.g. why is is this required and for what?

Since this is just a definition, I'd prefer to avoid adding such context here. Instead added brief description to SCEVMonotonicityChecker.

sjoerdmeijer · 2025-10-17T14:16:46Z

llvm/lib/Analysis/DependenceAnalysis.cpp

+/// sense. Note that the inequality "x <= y" merely indicates loop progression
+/// and is not affected by the difference between signed and unsigned order.
+///
+/// Currently we only consider monotonicity in a signed sense.


A lot of things have been discussed in the comments. I need to catch up on some, but that is my problem. What I was going to ask is this. I really like the clear descriptions so far, but can we add, or is it worth explaining a little bit more, the algorithm that determines monotonicity? That will be a high level description of course, but I feel explaining the concepts of looking at AddRec and the nowrap flags etc. is missing a little bit.

Added some more comments.

sjoerdmeijer · 2025-10-17T14:39:14Z

(for example, I suspect we can write test cases that are vectorized but they are not legal,

I think you should definitely do that, and raise issues for them. That's is the first step to discussing this further, I think, if we can find such examples.

and it is question for me how important is to fix all these unlikely corner cases -- EDIT: in particular could this kind of bug be a security issue?).

But this code is not portable, it is not language conforming? Thus, this example and its behaviour is tied to this particular platform, but that the behaviour isn't specified and could be anything. Reading passed the array is definitely a security problem, but I am not yet connecting the dots here with the SCEV behaviour.

kasuga-fj · 2025-10-17T17:08:46Z

I don't have objections to merging this patch. This works well if effective pointer size is 32 or 64. But I think this issue needs to be discussed further, independent of this patch (for example, I suspect we can write test cases that are vectorized but they are not legal, and it is question for me how important is to fix all these unlikely corner cases -- EDIT: in particular could this kind of bug be a security issue?).

I'm not sure if this answers your question, but you might want to take a look at LangRef for details on how address calculations are interpreted in LLVM. I'm also not entirely sure, but it might be true that we should pay a bit more attention to the pointer index type, which should be able to get from functions like DataLayout::getIndexType.
However, IIUC, since we're obtaining the SCEV from an actual pointer value that exists in the IR and using exact BTCs, I don't think it's much of a problem.

amehsan · 2025-10-17T18:34:54Z

reading passed the array is definitely a security problem

We are not accessing any out of bound memory in the above example. I am not sure whether this violates any language standard or not. I will look into it.

The fundamental issue is that two different 64 bit addresses may point to the same memory location at least on AArch64. DataLayout::getIndexSize() and DataLayout::getPointerSize() both return 8. I will check the langref further to see if there is anything I miss.

On x86 the situation maybe different. Since the upper bits of address have to be in a canonical form. I will do some more investigation on x86 as well, and then open an issue to discuss the implications.

Regarding the bug, haven't checked vectorization yet, but I have another bug. Basically alias analysis thinks p and p + 1ULL << 58 point to two different memory location but that is not correct. (I have tried this on two different AArch64 chips from two different vendors)

Compile with O0 and O3. The results are different.

#include <stdio.h>

__attribute__((noinline))
char foo (char *a) {

  char *b = a+ (1ULL << 58) ;
  *b = *a + 2;
  return *a;

}

int main () {

  char a[4];
  a[0] = 25;
  a[1] = 26;
  a[2] = 27;
  a[3] = 28;
  char t = foo (a);
  printf("result: %d\n", t);

}

sjoerdmeijer · 2025-10-17T19:11:56Z

reading passed the array is definitely a security problem

We are not accessing any out of bound memory in the above example. I am not sure whether this violates any language standard or not. I will look into it.

The fundamental issue is that two different 64 bit addresses may point to the same memory location at least on AArch64. DataLayout::getIndexSize() and DataLayout::getPointerSize() both return 8. I will check the langref further to see if there is anything I miss.

On x86 the situation maybe different. Since the upper bits of address have to be in a canonical form. I will do some more investigation on x86 as well, and then open an issue to discuss the implications.

Regarding the bug, haven't checked vectorization yet, but I have another bug. Basically alias analysis thinks p and p + 1ULL << 58 point to two different memory location but that is not correct. (I have tried this on two different AArch64 chips from two different vendors)

Compile with O0 and O3. The results are different.
#include <stdio.h>

__attribute__((noinline))
char foo (char *a) {

  char *b = a+ (1ULL << 58) ;
  *b = *a + 2;
  return *a;

}

int main () {

  char a[4];
  a[0] = 25;
  a[1] = 26;
  a[2] = 27;
  a[3] = 28;
  char t = foo (a);
  printf("result: %d\n", t);

}

Thanks for the example @amehsan .
Would you mind starting a discourse thread on this topic? I feel we could use some little help with this one, also get the language lawyers involved. :-)

nikic · 2025-10-17T19:24:12Z

You are conflating the physical machine behavior with the abstract machine model. As far as LLVM is concerned, doing an access at address p+(1<<58) is UB, because you do not have provenance to access that address. The fact that, on the underlying hardware, this address maps to the same memory is irrelevant, in the same way that an out of bounds array access is UB even if another object just so happens to be allocated at that address and the access does not actually trap.

Using TBI in a way that is compatible with LLVM's memory model is actually quite tricky (*), but I don't think this is really relevant to what this PR is doing.

(*) One way to model this is to destroy the old provenance and allocate a fresh one whenever the TBI bits change. This effectively means that from an AM perspective, the object only ever lives at a single location at any given time.

amehsan · 2025-10-18T16:40:06Z

As far as LLVM is concerned, doing an access at address p+(1<<58) is UB, because you do not have provenance to access that address.

You are right. I missed that. Adding provenance to the picture I suspect there might be a more permissive algorithm for monotonicity. I need to do some more work on it. Will post an update by Monday.

amehsan · 2025-10-20T17:53:03Z

I explain the main idea with an example here. If this is correct and can be made to work, it will be indepndent of this patch and probably something that can be implemented earlier in pipeline. Consider the following loop. Also this part of langref about inbounds attribute is important here:

The successive addition of the current address, truncated to the pointer index type and interpreted as an unsigned number, and each offset, interpreted as a signed number, does not wrap the pointer index type

void foo2 (char *A,  uint64_t   n) {

// for.body:                                         ; preds = %entry, %for.body
//  %j.04 = phi i64 [ 0, %entry ], [ %inc, %for.body ]
//  %conv = trunc i64 %j.04 to i8
//  %arrayidx = getelementptr inbounds nuw i8, ptr %A, i64 %j.04
//  store i8 %conv, ptr %arrayidx, align 1, !tbaa !6
//  %inc = add nuw i64 %j.04, 1
//  %exitcond = icmp eq i64 %inc, %umax
//  br i1 %exitcond, label %for.cond.cleanup, label %for.body, !llvm.loop !9

       for (uint64_t j = 0;  j <= n ; j ++) {
         A[j] = (char) j;
       }
}

The value of %j.04 can be any 64 bit integer. In the context of GEP instruction it should be intepreted as a signed integer that can take values in the range [-2^63, 2^63-1]. I assume we have a 64 bit address space starting from 0 and ending at 2^64 - 1. There are a couple of possibilities for address of A

If address of A is 2^63+X (X is a positive value less than 2^63): Consider j = 2^63 - X. A[j - 1] will address last memory byte, 2^64-1, and A[j] will wrap i64 type, in violation of the inbounds rule above. So to avoid UB, there must be an upper bound of at most 2^63 - 1 for j. (when j is interreted as a signed integer).
We can also make a similar argument and prove a bound for j if address of A is 2^63+X (here X is any negative 64 bit number)
The only remaining case is that address of A is exactly at 2^63. In this case values of A[j] can address any memory location in the entire 64-bit address space. To ensure we have provenance to access any memory location, there must have been a single allocated object of size 2^64. While this may be OK for the abstract machine, this is certainly not the case for any 64 bit target that I am aware of (*). It is reasonable to add a new parameter to e.g. DataLayout (or somewhere else) that is an upper bound on the size of a single allocated object (**). If we have a bound smaller than 2^64 then this last case is also impossible.

Essentially we prove that j cannot wrap. Even though loop conditions allow that, there will UB somewhere else in the loop if j really takes all possible values. Either we violate conditions of inbound or the allocated object should be impossibly large.

(EDIT: Unless I hear something that undermines the whole idea for some reason, we will follow up on this separately like an indepndent feature. RFC, etc.)

(*) AArch64 and x86 machines use less than 64 bits for virtual address as discussed before. A machine that uses e.g. 56 bits for addressing cannot allocate an object of size larger than 2^56 simply because it cannot address it entirely.

(**) I am not sure DataLayout is the best place. The bound could be 2^64 bit by default, but set to a smaller value depending on -mcpu option. Realistic bounds are definitely smaller, but this is provably correct. It is also possible that the user sets this bound for a program being compiled.

amehsan

as mentioned above, I think this patch should be merged independent of the idea that I explained.

nikic · 2025-10-20T19:56:52Z

@amehsan The size of allocated objects is already limited to at most half the address space, see https://llvm.org/docs/LangRef.html#allocated-objects.

The kind of reasoning you do is indeed valid (*), and something that LAA already does, see:

llvm-project/llvm/lib/Analysis/LoopAccessAnalysis.cpp

Lines 1026 to 1033 in ef87da0

    
           // An nusw getelementptr that is an AddRec cannot wrap. If it would wrap, 
        
           // the distance between the previously accessed location and the wrapped 
        
           // location will be larger than half the pointer index type space. In that 
        
           // case, the GEP would be  poison and any memory access dependent on it would 
        
           // be immediate UB when executed. 
        
           if (auto *GEP = dyn_cast_if_present<GetElementPtrInst>(Ptr); 
        
               GEP && GEP->hasNoUnsignedSignedWrap()) 
        
             return true;

However, this reasoning critically depends on the assumption that the GEP feeds into a memory access that will be executed. The way this is handled in LAA is not entirely correct, see the fix in #161445. We can't use this reasoning in conditionally executed code.

(*) It may not be valid in multi-provenance models -- but that's mostly an argument against having a multi-provenance model...

amehsan · 2025-10-20T20:18:37Z

The size of allocated objects is already limited to at most half the address space,

OK, I am not sure if pushing down this upper bound could help or not.

this reasoning critically depends on the assumption that the GEP feeds into a memory access that will be executed.

Correct.

something that LAA already does

Two comments on this:

From a very quick glance, I suspect there might some bugs in this code. If the offset of GEP is a function F of induction variable such that the minimum and maximum values in range(F) are not too far from each other, then the conclusion may not be correct. Just having nsuw flag is not enough. We need to check possible values that the offset can take. I am not sure if LAA performs this check somewhere or not.

I believe this would be more useful somewhere else, not in LAA or DA. For exmaple if this reasoning can infer nsw/nuw flags for induction variable of a loop, that might have wider benefit than implementing it in LAA or DA. Another candidate could be in SCEV.

amehsan · 2025-10-21T03:42:05Z

something that LAA already does

After working on a few examples, I think I need to look more closely into the existing LAA code. Thanks for the information.

kasuga-fj · 2025-10-21T08:38:50Z

Since there don't seem to be any objections, I will go ahead and land the changes. Thanks for reviews!

kasuga-fj · 2025-10-21T09:15:07Z

@amehsan JFYI: This commit message really helped me understand the nusw-based inference in LAA.

amehsan · 2025-10-21T13:59:08Z

@amehsan JFYI: This commit message really helped me understand the nusw-based inference in LAA.

Thanks. I am wondering about cases like this:

void foo2 (char * restrict A,   uint64_t   n, uint64_t mask) {
       for (uint64_t j = 0;  j < n ; j ++) {
         uint16_t k = j;   // or  uint32_t k = j;
         A[k ] = A[k+1] + 4;
       }
}

Where your offset spans a much smaller interval (So we cannot rely on UB from object size). Somehow this is taken care of in vectorization. I believe that happens before we reach this part of LAA code and so here, we don't need to worry about it.

For this case, we check and if n is larger than a threshold (which depends on the size of k) we don't execute vectorized code. If we complicate the operation on j the loop will not be vectorized anymore.

kasuga-fj · 2025-10-21T14:45:19Z

Where your offset spans a much smaller interval (So we cannot rely on UB from object size). Somehow this is taken care of in vectorization. I believe that happens before we reach this part of LAA code and so here, we don't need to worry about it.

Seems correct. In my understanding,

Reasoning about nowrap property from nusw is performed only when the offset is an affine addrec.
In your example, I think the offset will become something like (zext i16 {0,+,1} to i64), which is not an addrec, so the inference will not be performed.

I guess PredicatedScalarEvolution::getAsAddRec adds some predicates (in this case n < 2^16 or similar one) to safely regards the SCEV as an affine addrec.

amehsan · 2025-10-21T20:11:25Z

only when the offset is an affine addrec

Yes, it is probably provably correct for affine addrec and affine addrec should cover most practically interesting cases. We'll look into implementing this logic earlier in the pipeline so it can help DA and potentially other passes too.

* [flang] Fix standalone build regression from llvm#161179 (llvm#164309) Fix incorrect linking and dependencies introduced in llvm#161179 that break standalone builds of Flang. Signed-off-by: Michał Górny <[email protected]> * [AMDGPU] Remove magic constants from V_PK_ADD_F32 pattern. NFC (llvm#164335) * [AMDGPU] Update code sequence for CU-mode Release Fences in GFX10+ (llvm#161638) They were previously optimized to not emit any waitcnt, which is technically correct because there is no reordering of operations at workgroup scope in CU mode for GFX10+. This breaks transitivity however, for example if we have the following sequence of events in one thread: - some stores - store atomic release syncscope("workgroup") - barrier then another thread follows with - barrier - load atomic acquire - store atomic release syncscope("agent") It does not work because, while the other thread sees the stores, it cannot release them at the wider scope. Our release fences aren't strong enough to "wait" on stores from other waves. We also cannot strengthen our release fences any further to allow for releasing other wave's stores because only GFX12 can do that with `global_wb`. GFX10-11 do not have the writeback instruction. It'd also add yet another level of complexity to code sequences, with both acquire/release having CU-mode only alternatives. Lastly, acq/rel are always used together. The price for synchronization has to be paid either at the acq, or the rel. Strengthening the releases would just make the memory model more complex but wouldn't help performance. So the choice here is to streamline the code sequences by making CU and WGP mode emit almost identical (vL0 inv is not needed in CU mode) code for release (or stronger) atomic ordering. This also removes the `vm_vsrc(0)` wait before barriers. Now that the release fence in CU mode is strong enough, it is no longer needed. Supersedes llvm#160501 Solves SC1-6454 * [InstSimplify] Support ptrtoaddr in simplifyGEPInst() (llvm#164262) This adds support for ptrtoaddr in the `ptradd p, ptrtoaddr(p2) - ptrtoaddr(p) -> p2` fold. This fold requires that p and p2 have the same underlying object (otherwise the provenance may not be the same). The argument I would like to make here is that because the underlying objects are the same (and the pointers in the same address space), the non-address bits of the pointer must be the same. Looking at some specific cases of underlying object relationship: * phi/select: Trivially true. * getelementptr: Only modifies address bits, non-address bits must remain the same. * addrspacecast round-trip cast: Must preserve all bits because we optimize such round-trip casts away. * non-interposable global alias: I'm a bit unsure about this one, but I guess the alias and the aliasee must have the same non-address bits? * various intrinsics like launder.invariant.group, ptrmask. I think these all either preserve all pointer bits (like the invariant.group ones) or at least the non-address bits (like ptrmask). There are some interesting cases like amdgcn.make.buffer.rsrc, but those are cross address-space. ----- There is a second `gep (gep p, C), (sub 0, ptrtoint(p)) -> C` transform in this function, which I am not extending to handle ptrtoaddr, adding negative tests instead. This transform is overall dubious for provenance reasons, but especially dubious with ptrtoaddr, as then we don't have the guarantee that provenance of `p` has been exposed. * [Hexagon] Add REQUIRES: asserts to test This test uses -debug-only, so needs an assertion-enabled build. * [AArch64] Combing scalar_to_reg into DUP if the DUP already exists (llvm#160499) If we already have a dup(x) as part of the DAG along with a scalar_to_vec(x), we can re-use the result of the dup to the scalar_to_vec(x). * [CAS] OnDiskGraphDB - fix MSVC "not all control paths return a value" warnings. NFC. (llvm#164369) * Reapply "[libc++] Optimize __hash_table::erase(iterator, iterator)" (llvm#162850) This reapplication fixes the use after free caused by not properly updating the bucket list in one case. Original commit message: Instead of just calling the single element `erase` on every element of the range, we can combine some of the operations in a custom implementation. Specifically, we don't need to search for the previous node or re-link the list every iteration. Removing this unnecessary work results in some nice performance improvements: ``` ----------------------------------------------------------------------------------------------------------------------- Benchmark old new ----------------------------------------------------------------------------------------------------------------------- std::unordered_set<int>::erase(iterator, iterator) (erase half the container)/0 457 ns 459 ns std::unordered_set<int>::erase(iterator, iterator) (erase half the container)/32 995 ns 626 ns std::unordered_set<int>::erase(iterator, iterator) (erase half the container)/1024 18196 ns 7995 ns std::unordered_set<int>::erase(iterator, iterator) (erase half the container)/8192 124722 ns 70125 ns std::unordered_set<std::string>::erase(iterator, iterator) (erase half the container)/0 456 ns 461 ns std::unordered_set<std::string>::erase(iterator, iterator) (erase half the container)/32 1183 ns 769 ns std::unordered_set<std::string>::erase(iterator, iterator) (erase half the container)/1024 27827 ns 18614 ns std::unordered_set<std::string>::erase(iterator, iterator) (erase half the container)/8192 266681 ns 226107 ns std::unordered_map<int, int>::erase(iterator, iterator) (erase half the container)/0 455 ns 462 ns std::unordered_map<int, int>::erase(iterator, iterator) (erase half the container)/32 996 ns 659 ns std::unordered_map<int, int>::erase(iterator, iterator) (erase half the container)/1024 15963 ns 8108 ns std::unordered_map<int, int>::erase(iterator, iterator) (erase half the container)/8192 136493 ns 71848 ns std::unordered_multiset<int>::erase(iterator, iterator) (erase half the container)/0 454 ns 455 ns std::unordered_multiset<int>::erase(iterator, iterator) (erase half the container)/32 985 ns 703 ns std::unordered_multiset<int>::erase(iterator, iterator) (erase half the container)/1024 16277 ns 9085 ns std::unordered_multiset<int>::erase(iterator, iterator) (erase half the container)/8192 125736 ns 82710 ns std::unordered_multimap<int, int>::erase(iterator, iterator) (erase half the container)/0 457 ns 454 ns std::unordered_multimap<int, int>::erase(iterator, iterator) (erase half the container)/32 1091 ns 646 ns std::unordered_multimap<int, int>::erase(iterator, iterator) (erase half the container)/1024 17784 ns 7664 ns std::unordered_multimap<int, int>::erase(iterator, iterator) (erase half the container)/8192 127098 ns 72806 ns ``` This reverts commit acc3a62. * [TableGen] List the indices of sub-operands (llvm#163723) Some instances of the `Operand` class used in Tablegen instruction definitions expand to a cluster of multiple operands at the MC layer, such as complex addressing modes involving base + offset + shift, or clusters of operands describing conditional Arm instructions or predicated MVE instructions. There's currently no convenient way for C++ code to know the offset of one of those sub-operands from the start of the cluster: instead it just hard-codes magic numbers like `index+2`, which is hard to read and fragile. This patch adds an extra piece of output to `InstrInfoEmitter` to define those instruction offsets, based on the name of the `Operand` class instance in Tablegen, and the names assigned to the sub-operands in the `MIOperandInfo` field. For example, if target Foo were to define def Bar : Operand { let MIOperandInfo = (ops GPR:$first, i32imm:$second); // ... } then the new constants would be `Foo::SUBOP_Bar_first` and `Foo::SUBOP_Bar_second`, defined as 0 and 1 respectively. As an example, I've converted some magic numbers related to the MVE predication operand types (`vpred_n` and its superset `vpred_r`) to use the new named constants in place of the integer literals they previously used. This is more verbose, but also clearer, because it explains why the integer is chosen instead of what its value is. * [lldb] Add bidirectional packetLog to gdbclientutils.py (llvm#162176) While debugging the tests for llvm#155000 I found it helpful to have both sides of the simulated gdb-rsp traffic rather than just the responses so I've extended the packetLog in MockGDBServerResponder to record traffic in both directions. Tests have been updated accordingly * [MLIR] [Vector] Added canonicalizer for folding from_elements + transpose (llvm#161841) ## Description Adds a new canonicalizer that folds `vector.from_elements(vector.transpose))` => `vector.from_elements`. This canonicalization reorders the input elements for `vector.from_elements`, adjusts the output shape to match the effect of the transpose op and eliminating its need. ## Testing Added a 2D vector lit test that verifies the working of the rewrite. --------- Signed-off-by: Keshav Vinayak Jha <[email protected]> * [DA] Add initial support for monotonicity check (llvm#162280) The dependence testing functions in DA assume that the analyzed AddRec does not wrap over the entire iteration space. For AddRecs that may wrap, DA should conservatively return unknown dependence. However, no validation is currently performed to ensure that this condition holds, which can lead to incorrect results in some cases. This patch introduces the notion of *monotonicity* and a validation logic to check whether a SCEV is monotonic. The monotonicity check classifies the SCEV into one of the following categories: - Unknown: Nothing is known about the monotonicity of the SCEV. - Invariant: The SCEV is loop-invariant. - MultivariateSignedMonotonic: The SCEV doesn't wrap in a signed sense for any iteration of the loops in the loop nest. The current validation logic basically searches an affine AddRec recursively and checks whether the `nsw` flag is present. Notably, it is still unclear whether we should also have a category for unsigned monotonicity. The monotonicity check is still under development and disabled by default for now. Since such a check is necessary to make DA sound, it should be enabled by default once the functionality is sufficient. Split off from llvm#154527. * [VPlan] Use VPlan::getRegion to shorten code (NFC) (llvm#164287) * [VPlan] Improve code using m_APInt (NFC) (llvm#161683) * [SystemZ] Avoid trunc(add(X,X)) patterns (llvm#164378) Replace with trunc(add(X,Y)) to avoid premature folding in upcoming patch llvm#164227 * [clang][CodeGen] Emit `llvm.tbaa.errno` metadata during module creation Let Clang emit `llvm.tbaa.errno` metadata in order to let LLVM carry out optimizations around errno-writing libcalls to, as long as it is proved the involved memory location does not alias `errno`. Previous discussion: https://discourse.llvm.org/t/rfc-modelling-errno-memory-effects/82972. * [LV][NFC] Remove undef from phi incoming values (llvm#163762) Split off from PR llvm#163525, this standalone patch replaces use of undef as incoming PHI values with zero, in order to reduce the likelihood of contributors hitting the `undef deprecator` warning in github. * [DA] Add option to enable specific dependence test only (llvm#164245) PR llvm#157084 added an option `da-run-siv-routines-only` to run only SIV routines in DA. This PR replaces that option with a more fine-grained one that allows to select other than SIV routines as well. This option is useful for regression testing of individual DA routines. This patch also reorganizes regression tests that use `da-run-siv-routines-only`. * [libcxx] Optimize `std::generate_n` for segmented iterators (llvm#164266) Part of llvm#102817. This is a natural follow-up to llvm#163006. We are forwarding `std::generate_n` to `std::__for_each_n` (`std::for_each_n` needs c++17), resulting in improved performance for segmented iterators. before: ``` std::generate_n(deque<int>)/32 17.5 ns 17.3 ns 40727273 std::generate_n(deque<int>)/50 25.7 ns 25.5 ns 26352941 std::generate_n(deque<int>)/1024 490 ns 487 ns 1445161 std::generate_n(deque<int>)/8192 3908 ns 3924 ns 179200 ``` after: ``` std::generate_n(deque<int>)/32 11.1 ns 11.0 ns 64000000 std::generate_n(deque<int>)/50 16.1 ns 16.0 ns 44800000 std::generate_n(deque<int>)/1024 291 ns 292 ns 2357895 std::generate_n(deque<int>)/8192 2269 ns 2250 ns 298667 ``` * [BOLT] Check entry point address is not in constant island (llvm#163418) There are cases where `addEntryPointAtOffset` is called with a given `Offset` that points to an address within a constant island. This triggers `assert(!isInConstantIsland(EntryPointAddress)` and causes BOLT to crash. This patch adds a check which ignores functions that would add such entry points and warns the user. * [llvm][dwarfdump] Pretty-print DW_AT_language_version (llvm#164222) In both verbose and non-verbose mode we will now use the `llvm::dwarf::LanguageDescription` to turn the version into a human readable string. In verbose mode we also display the raw version code (similar to how we display addresses in verbose mode). To make the version code and prettified easier to distinguish, we print the prettified name in colour (if available), which is consistent with how `DW_AT_language` is printed in colour. Before: ``` 0x0000000c: DW_TAG_compile_unit DW_AT_language_name (DW_LNAME_C) DW_AT_language_version (201112) ``` After: ``` 0x0000000c: DW_TAG_compile_unit DW_AT_language_name (DW_LNAME_C) DW_AT_language_version (201112 C11) ``` --------- Signed-off-by: Michał Górny <[email protected]> Signed-off-by: Keshav Vinayak Jha <[email protected]> Co-authored-by: Michał Górny <[email protected]> Co-authored-by: Stanislav Mekhanoshin <[email protected]> Co-authored-by: Pierre van Houtryve <[email protected]> Co-authored-by: Nikita Popov <[email protected]> Co-authored-by: David Green <[email protected]> Co-authored-by: Simon Pilgrim <[email protected]> Co-authored-by: Nikolas Klauser <[email protected]> Co-authored-by: Simon Tatham <[email protected]> Co-authored-by: Daniel Sanders <[email protected]> Co-authored-by: Keshav Vinayak Jha <[email protected]> Co-authored-by: Ryotaro Kasuga <[email protected]> Co-authored-by: Ramkumar Ramachandra <[email protected]> Co-authored-by: Antonio Frighetto <[email protected]> Co-authored-by: David Sherwood <[email protected]> Co-authored-by: Connector Switch <[email protected]> Co-authored-by: Asher Dobrescu <[email protected]> Co-authored-by: Michael Buch <[email protected]>

The dependence testing functions in DA assume that the analyzed AddRec does not wrap over the entire iteration space. For AddRecs that may wrap, DA should conservatively return unknown dependence. However, no validation is currently performed to ensure that this condition holds, which can lead to incorrect results in some cases. This patch introduces the notion of *monotonicity* and a validation logic to check whether a SCEV is monotonic. The monotonicity check classifies the SCEV into one of the following categories: - Unknown: Nothing is known about the monotonicity of the SCEV. - Invariant: The SCEV is loop-invariant. - MultivariateSignedMonotonic: The SCEV doesn't wrap in a signed sense for any iteration of the loops in the loop nest. The current validation logic basically searches an affine AddRec recursively and checks whether the `nsw` flag is present. Notably, it is still unclear whether we should also have a category for unsigned monotonicity. The monotonicity check is still under development and disabled by default for now. Since such a check is necessary to make DA sound, it should be enabled by default once the functionality is sufficient. Split off from llvm#154527.

kasuga-fj mentioned this pull request Oct 7, 2025

[DA] Add tests for nsw doesn't hold on entier iteration #162281

Draft

[DA] Add initial support for monotonicity check

9bfa9d5

kasuga-fj force-pushed the users/kasuga-fj/da-monotonic-check-0 branch from 611229f to 9bfa9d5 Compare October 9, 2025 10:51

kasuga-fj commented Oct 9, 2025

View reviewed changes

kasuga-fj mentioned this pull request Oct 9, 2025

[DA] Check monotonicity for subscripts #154527

Open

kasuga-fj marked this pull request as ready for review October 9, 2025 11:12

llvmbot added the llvm:analysis Includes value tracking, cost tables and constant folding label Oct 9, 2025

kasuga-fj requested review from Meinersbur, amehsan, nikic and sjoerdmeijer October 9, 2025 11:13

add another test

d088dbe

amehsan reviewed Oct 10, 2025

View reviewed changes

amehsan reviewed Oct 13, 2025

View reviewed changes

add outer_loop_may_wrap

9c29421

amehsan reviewed Oct 14, 2025

View reviewed changes

add more tests

da8859c

Meinersbur reviewed Oct 15, 2025

View reviewed changes

amehsan reviewed Oct 16, 2025

View reviewed changes

Meinersbur reviewed Oct 16, 2025

View reviewed changes

revise the definition of monotonicity

83ab4c6

kasuga-fj force-pushed the users/kasuga-fj/da-monotonic-check-0 branch from aceb708 to 83ab4c6 Compare October 16, 2025 16:39

Meinersbur approved these changes Oct 17, 2025

View reviewed changes

sjoerdmeijer reviewed Oct 17, 2025

View reviewed changes

kasuga-fj added 2 commits October 18, 2025 00:37

Merge branch 'main' into users/kasuga-fj/da-monotonic-check-0

0d83581

address review comments

bb17ccd

amehsan approved these changes Oct 20, 2025

View reviewed changes

Merge branch 'main' into users/kasuga-fj/da-monotonic-check-0

b708d71

kasuga-fj enabled auto-merge (squash) October 21, 2025 08:41

kasuga-fj merged commit ab789be into main Oct 21, 2025
9 of 10 checks passed

kasuga-fj deleted the users/kasuga-fj/da-monotonic-check-0 branch October 21, 2025 09:11

		struct SCEVMonotonicityChecker
		: public SCEVVisitor<SCEVMonotonicityChecker, SCEVMonotonicity> {

		@@ -0,0 +1,459 @@
		; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py UTC_ARGS: --version 6

	/// where i_1, ..., i_{k-1}, i_{k+1}, ..., i_N, x, y in their domains.
	/// where i_1, ..., i_{k-1}, i_{k+1}, ..., i_N, x, and y are elements of their
	/// respective domains.

	/// A function F with either monotonically increasing or decreasing with
	/// A function F that is monotonically increasing or decreasing with

[DA] Add initial support for monotonicity check #162280

[DA] Add initial support for monotonicity check #162280

Uh oh!

Conversation

kasuga-fj commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kasuga-fj commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amehsan Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

llvmbot commented Oct 9, 2025

Uh oh!

amehsan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kasuga-fj Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sjoerdmeijer commented Oct 14, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sjoerdmeijer commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kasuga-fj commented Oct 15, 2025

Uh oh!

Meinersbur Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amehsan commented Oct 15, 2025

Uh oh!

kasuga-fj commented Oct 15, 2025

Uh oh!

kasuga-fj commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Footnotes

Uh oh!

amehsan commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kasuga-fj commented Oct 15, 2025

Uh oh!

amehsan left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

kasuga-fj commented Oct 7, 2025 •

edited

Loading

kasuga-fj commented Oct 7, 2025 •

edited

Loading

amehsan Oct 9, 2025 •

edited

Loading

kasuga-fj Oct 14, 2025 •

edited

Loading

sjoerdmeijer commented Oct 15, 2025 •

edited

Loading

Meinersbur Oct 15, 2025 •

edited

Loading

kasuga-fj commented Oct 15, 2025 •

edited

Loading

amehsan commented Oct 15, 2025 •

edited

Loading

amehsan left a comment •

edited

Loading

amehsan commented Oct 17, 2025 •

edited

Loading