Skip to content

Conversation

@usx95
Copy link
Contributor

@usx95 usx95 commented Nov 1, 2025

Optimize the FactManager and DataflowAnalysis classes by using vector-based storage with ID-based lookups instead of maps.

  • Added a FactID type using the utils::ID template to uniquely identify facts
  • Modified Fact class to store and manage IDs
  • Changed FactManager to use vector-based storage indexed by block ID instead of a map
  • Updated DataflowAnalysis to use vector-based storage for states instead of maps
  • Modified lookups to use ID-based indexing for better performance

Improves compile time hit on long-tail targets like tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/TargetBuiltins/RISCV.cpp.o​ from 21% to 3.2%

Copy link
Contributor Author

usx95 commented Nov 1, 2025

@usx95 usx95 changed the title Avoid using DenseMap for CFGBlock and program points [LifetimeSafety] Optimize fact storage with IDs and vector-based lookup Nov 1, 2025
@usx95 usx95 force-pushed the users/usx95/11-01-avoid_using_densemap_for_cfgblock_and_program_points branch 4 times, most recently from f48ebcf to 7e6051b Compare November 1, 2025 04:21
@usx95 usx95 marked this pull request as ready for review November 1, 2025 04:24
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:analysis clang:temporal-safety Issue/FR relating to the lifetime analysis in Clang (-Wdangling, -Wreturn-local-addr) labels Nov 1, 2025
@llvmbot
Copy link
Member

llvmbot commented Nov 1, 2025

@llvm/pr-subscribers-clang
@llvm/pr-subscribers-clang-analysis

@llvm/pr-subscribers-clang-temporal-safety

Author: Utkarsh Saxena (usx95)

Changes

Optimize the FactManager and DataflowAnalysis classes by using vector-based storage with ID-based lookups instead of maps.

  • Added a FactID type using the utils::ID template to uniquely identify facts
  • Modified Fact class to store and manage IDs
  • Changed FactManager to use vector-based storage indexed by block ID instead of a map
  • Updated DataflowAnalysis to use vector-based storage for states instead of maps
  • Modified lookups to use ID-based indexing for better performance

Improves compile time hit on long-tail targets like tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/TargetBuiltins/RISCV.cpp.o​ from 21% to 3.2%


Full diff: https://github.com/llvm/llvm-project/pull/165963.diff

4 Files Affected:

  • (modified) clang/include/clang/Analysis/Analyses/LifetimeSafety/Facts.h (+23-8)
  • (modified) clang/lib/Analysis/LifetimeSafety/Dataflow.h (+9-5)
  • (modified) clang/lib/Analysis/LifetimeSafety/Facts.cpp (+5-8)
  • (modified) clang/lib/Analysis/LifetimeSafety/LifetimeSafety.cpp (+1)
diff --git a/clang/include/clang/Analysis/Analyses/LifetimeSafety/Facts.h b/clang/include/clang/Analysis/Analyses/LifetimeSafety/Facts.h
index 063cb5c2d42ab..b9cad5340c940 100644
--- a/clang/include/clang/Analysis/Analyses/LifetimeSafety/Facts.h
+++ b/clang/include/clang/Analysis/Analyses/LifetimeSafety/Facts.h
@@ -16,6 +16,7 @@
 
 #include "clang/Analysis/Analyses/LifetimeSafety/Loans.h"
 #include "clang/Analysis/Analyses/LifetimeSafety/Origins.h"
+#include "clang/Analysis/Analyses/LifetimeSafety/Utils.h"
 #include "clang/Analysis/AnalysisDeclContext.h"
 #include "clang/Analysis/CFG.h"
 #include "llvm/ADT/SmallVector.h"
@@ -23,6 +24,9 @@
 #include <cstdint>
 
 namespace clang::lifetimes::internal {
+
+using FactID = utils::ID<struct FactTag>;
+
 /// An abstract base class for a single, atomic lifetime-relevant event.
 class Fact {
 
@@ -48,6 +52,7 @@ class Fact {
 
 private:
   Kind K;
+  FactID ID;
 
 protected:
   Fact(Kind K) : K(K) {}
@@ -56,6 +61,9 @@ class Fact {
   virtual ~Fact() = default;
   Kind getKind() const { return K; }
 
+  void setID(FactID ID) { this->ID = ID; }
+  FactID getID() const { return ID; }
+
   template <typename T> const T *getAs() const {
     if (T::classof(this))
       return static_cast<const T *>(this);
@@ -183,22 +191,26 @@ class TestPointFact : public Fact {
 
 class FactManager {
 public:
+  void init(const CFG &Cfg) {
+    assert(BlockToFacts.empty() && "FactManager already initialized");
+    BlockToFacts.resize(Cfg.getNumBlockIDs());
+  }
+
   llvm::ArrayRef<const Fact *> getFacts(const CFGBlock *B) const {
-    auto It = BlockToFactsMap.find(B);
-    if (It != BlockToFactsMap.end())
-      return It->second;
-    return {};
+    return BlockToFacts[B->getBlockID()];
   }
 
   void addBlockFacts(const CFGBlock *B, llvm::ArrayRef<Fact *> NewFacts) {
     if (!NewFacts.empty())
-      BlockToFactsMap[B].assign(NewFacts.begin(), NewFacts.end());
+      BlockToFacts[B->getBlockID()].assign(NewFacts.begin(), NewFacts.end());
   }
 
   template <typename FactType, typename... Args>
   FactType *createFact(Args &&...args) {
     void *Mem = FactAllocator.Allocate<FactType>();
-    return new (Mem) FactType(std::forward<Args>(args)...);
+    FactType *Res = new (Mem) FactType(std::forward<Args>(args)...);
+    Res->setID(NextFactID++);
+    return Res;
   }
 
   void dump(const CFG &Cfg, AnalysisDeclContext &AC) const;
@@ -214,16 +226,19 @@ class FactManager {
   /// \note This is intended for testing only.
   llvm::StringMap<ProgramPoint> getTestPoints() const;
 
+  unsigned getNumFacts() const { return NextFactID.Value; }
+
   LoanManager &getLoanMgr() { return LoanMgr; }
   const LoanManager &getLoanMgr() const { return LoanMgr; }
   OriginManager &getOriginMgr() { return OriginMgr; }
   const OriginManager &getOriginMgr() const { return OriginMgr; }
 
 private:
+  FactID NextFactID{0};
   LoanManager LoanMgr;
   OriginManager OriginMgr;
-  llvm::DenseMap<const clang::CFGBlock *, llvm::SmallVector<const Fact *>>
-      BlockToFactsMap;
+  /// Facts for each CFG block, indexed by block ID.
+  llvm::SmallVector<llvm::SmallVector<const Fact *>> BlockToFacts;
   llvm::BumpPtrAllocator FactAllocator;
 };
 } // namespace clang::lifetimes::internal
diff --git a/clang/lib/Analysis/LifetimeSafety/Dataflow.h b/clang/lib/Analysis/LifetimeSafety/Dataflow.h
index 2f7bcb6e5dc81..de821bb17eb6b 100644
--- a/clang/lib/Analysis/LifetimeSafety/Dataflow.h
+++ b/clang/lib/Analysis/LifetimeSafety/Dataflow.h
@@ -67,10 +67,10 @@ class DataflowAnalysis {
   llvm::DenseMap<const CFGBlock *, Lattice> InStates;
   /// The dataflow state after a basic block is processed.
   llvm::DenseMap<const CFGBlock *, Lattice> OutStates;
-  /// The dataflow state at a Program Point.
+  /// Dataflow state at each program point, indexed by Fact ID.
   /// In a forward analysis, this is the state after the Fact at that point has
   /// been applied, while in a backward analysis, it is the state before.
-  llvm::DenseMap<ProgramPoint, Lattice> PerPointStates;
+  llvm::SmallVector<Lattice> PointToState;
 
   static constexpr bool isForward() { return Dir == Direction::Forward; }
 
@@ -86,6 +86,8 @@ class DataflowAnalysis {
     Derived &D = static_cast<Derived &>(*this);
     llvm::TimeTraceScope Time(D.getAnalysisName());
 
+    PointToState.resize(FactMgr.getNumFacts());
+
     using Worklist =
         std::conditional_t<Dir == Direction::Forward, ForwardDataflowWorklist,
                            BackwardDataflowWorklist>;
@@ -116,7 +118,9 @@ class DataflowAnalysis {
   }
 
 protected:
-  Lattice getState(ProgramPoint P) const { return PerPointStates.lookup(P); }
+  Lattice getState(ProgramPoint P) const {
+    return PointToState[P->getID().Value];
+  }
 
   std::optional<Lattice> getInState(const CFGBlock *B) const {
     auto It = InStates.find(B);
@@ -144,12 +148,12 @@ class DataflowAnalysis {
     if constexpr (isForward()) {
       for (const Fact *F : Facts) {
         State = transferFact(State, F);
-        PerPointStates[F] = State;
+        PointToState[F->getID().Value] = State;
       }
     } else {
       for (const Fact *F : llvm::reverse(Facts)) {
         // In backward analysis, capture the state before applying the fact.
-        PerPointStates[F] = State;
+        PointToState[F->getID().Value] = State;
         State = transferFact(State, F);
       }
     }
diff --git a/clang/lib/Analysis/LifetimeSafety/Facts.cpp b/clang/lib/Analysis/LifetimeSafety/Facts.cpp
index ecde390cd6ca3..4a4172fe55bf3 100644
--- a/clang/lib/Analysis/LifetimeSafety/Facts.cpp
+++ b/clang/lib/Analysis/LifetimeSafety/Facts.cpp
@@ -64,8 +64,8 @@ void TestPointFact::dump(llvm::raw_ostream &OS, const LoanManager &,
 
 llvm::StringMap<ProgramPoint> FactManager::getTestPoints() const {
   llvm::StringMap<ProgramPoint> AnnotationToPointMap;
-  for (const CFGBlock *Block : BlockToFactsMap.keys()) {
-    for (const Fact *F : getFacts(Block)) {
+  for (const auto &BlockFacts : BlockToFacts) {
+    for (const Fact *F : BlockFacts) {
       if (const auto *TPF = F->getAs<TestPointFact>()) {
         StringRef PointName = TPF->getAnnotation();
         assert(AnnotationToPointMap.find(PointName) ==
@@ -88,12 +88,9 @@ void FactManager::dump(const CFG &Cfg, AnalysisDeclContext &AC) const {
   // Print blocks in the order as they appear in code for a stable ordering.
   for (const CFGBlock *B : *AC.getAnalysis<PostOrderCFGView>()) {
     llvm::dbgs() << "  Block B" << B->getBlockID() << ":\n";
-    auto It = BlockToFactsMap.find(B);
-    if (It != BlockToFactsMap.end()) {
-      for (const Fact *F : It->second) {
-        llvm::dbgs() << "    ";
-        F->dump(llvm::dbgs(), LoanMgr, OriginMgr);
-      }
+    for (const Fact *F : getFacts(B)) {
+      llvm::dbgs() << "    ";
+      F->dump(llvm::dbgs(), LoanMgr, OriginMgr);
     }
     llvm::dbgs() << "  End of Block\n";
   }
diff --git a/clang/lib/Analysis/LifetimeSafety/LifetimeSafety.cpp b/clang/lib/Analysis/LifetimeSafety/LifetimeSafety.cpp
index 00c7ed90503e7..a51ba4280f284 100644
--- a/clang/lib/Analysis/LifetimeSafety/LifetimeSafety.cpp
+++ b/clang/lib/Analysis/LifetimeSafety/LifetimeSafety.cpp
@@ -41,6 +41,7 @@ void LifetimeSafetyAnalysis::run() {
   const CFG &Cfg = *AC.getCFG();
   DEBUG_WITH_TYPE("PrintCFG", Cfg.dump(AC.getASTContext().getLangOpts(),
                                        /*ShowColors=*/true));
+  FactMgr.init(Cfg);
 
   FactsGenerator FactGen(FactMgr, AC);
   FactGen.run();

@usx95 usx95 requested review from Xazax-hun and ymand November 1, 2025 04:26
@usx95 usx95 force-pushed the users/usx95/11-01-avoid_using_densemap_for_cfgblock_and_program_points branch from 7e6051b to 66119a7 Compare November 4, 2025 15:53
@usx95 usx95 force-pushed the users/usx95/10-30-persistent-origin-optimisation branch from 33c0c70 to 0bbc422 Compare November 4, 2025 15:53
@usx95 usx95 force-pushed the users/usx95/11-01-avoid_using_densemap_for_cfgblock_and_program_points branch from 66119a7 to e177f8d Compare November 7, 2025 03:02
@usx95 usx95 force-pushed the users/usx95/10-30-persistent-origin-optimisation branch from 0bbc422 to 667242a Compare November 7, 2025 03:02
Base automatically changed from users/usx95/10-30-persistent-origin-optimisation to main November 7, 2025 03:02
@usx95 usx95 force-pushed the users/usx95/11-01-avoid_using_densemap_for_cfgblock_and_program_points branch 3 times, most recently from 9c32808 to 5029ce8 Compare November 7, 2025 03:41
Copy link
Contributor Author

usx95 commented Nov 7, 2025

Merge activity

  • Nov 7, 3:41 AM UTC: Graphite rebased this pull request as part of a merge.
  • Nov 7, 3:57 AM UTC: Graphite rebased this pull request as part of a merge.
  • Nov 7, 4:00 AM UTC: Graphite rebased this pull request as part of a merge.
  • Nov 7, 4:05 AM UTC: Graphite rebased this pull request as part of a merge.
  • Nov 7, 4:27 AM UTC: @usx95 merged this pull request with Graphite.

@usx95 usx95 force-pushed the users/usx95/11-01-avoid_using_densemap_for_cfgblock_and_program_points branch 2 times, most recently from 2dd60af to 2f4af6f Compare November 7, 2025 03:59
@usx95 usx95 force-pushed the users/usx95/11-01-avoid_using_densemap_for_cfgblock_and_program_points branch from 2f4af6f to 0dafeb9 Compare November 7, 2025 04:04
@usx95 usx95 merged commit c9b4169 into main Nov 7, 2025
7 of 9 checks passed
@usx95 usx95 deleted the users/usx95/11-01-avoid_using_densemap_for_cfgblock_and_program_points branch November 7, 2025 04:27
@llvm-ci
Copy link
Collaborator

llvm-ci commented Nov 7, 2025

LLVM Buildbot has detected a new failure on builder llvm-clang-x86_64-darwin running on doug-worker-3 while building clang at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/23/builds/15335

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'Clang :: Headers/stdckdint.c' FAILED ********************
Exit Code: 127

Command Output (stdout):
--
# RUN: at line 2
/Volumes/RAMDisk/buildbot-root/x86_64-darwin/build/bin/clang -cc1 -internal-isystem /Volumes/RAMDisk/buildbot-root/x86_64-darwin/build/lib/clang/22/include -nostdsysteminc -triple=x86_64 -emit-llvm -verify -std=c99 /Volumes/RAMDisk/buildbot-root/x86_64-darwin/llvm-project/clang/test/Headers/stdckdint.c -o - | /Volumes/RAMDisk/buildbot-root/x86_64-darwin/build/bin/FileCheck /Volumes/RAMDisk/buildbot-root/x86_64-darwin/llvm-project/clang/test/Headers/stdckdint.c
# executed command: /Volumes/RAMDisk/buildbot-root/x86_64-darwin/build/bin/clang -cc1 -internal-isystem /Volumes/RAMDisk/buildbot-root/x86_64-darwin/build/lib/clang/22/include -nostdsysteminc -triple=x86_64 -emit-llvm -verify -std=c99 /Volumes/RAMDisk/buildbot-root/x86_64-darwin/llvm-project/clang/test/Headers/stdckdint.c -o -
# .---command stderr------------
# | Could not create process (/Volumes/RAMDisk/buildbot-root/x86_64-darwin/build/bin/clang) due to [Errno 24] Too many open files
# `-----------------------------
# error: command failed with exit status: 127

--

********************


vinay-deshmukh pushed a commit to vinay-deshmukh/llvm-project that referenced this pull request Nov 8, 2025
…up (llvm#165963)

Optimize the FactManager and DataflowAnalysis classes by using vector-based storage with ID-based lookups instead of maps.

- Added a `FactID` type using the `utils::ID` template to uniquely identify facts
- Modified `Fact` class to store and manage IDs
- Changed `FactManager` to use vector-based storage indexed by block ID instead of a map
- Updated `DataflowAnalysis` to use vector-based storage for states instead of maps
- Modified lookups to use ID-based indexing for better performance

Improves compile time hit on long-tail targets like `tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/TargetBuiltins/RISCV.cpp.o`​ from [21%](http://llvm-compile-time-tracker.com/compare_clang.php?from=6e25a04027ca786b7919657c7df330a33985ceea&to=20b42efa277c8b1915db757863e1fc26531cfd53&stat=instructions%3Au&sortBy=absolute-difference) to [3.2%](http://llvm-compile-time-tracker.com/compare_clang.php?from=6e25a04027ca786b7919657c7df330a33985ceea&to=d2d1cd1109c3a85344457bfff6f092ae7b96b211&stat=instructions%3Au&sortBy=absolute-difference)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clang:analysis clang:temporal-safety Issue/FR relating to the lifetime analysis in Clang (-Wdangling, -Wreturn-local-addr) clang Clang issues not falling into any other category

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

6 participants