Skip to content

[NFC][IR2Vec] Minor refactoring of opcode access in vocabulary #147585

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

svkeerthy
Copy link
Contributor

@svkeerthy svkeerthy commented Jul 8, 2025

Refactored IR2Vec vocabulary handling to improve code organization and error handling. This would help in upcoming PRs related to the IR2Vec tool.

(Tracking issue - #141817)

@svkeerthy svkeerthy force-pushed the users/svkeerthy/07-07-_nfc_ir2vec_minor_refactoring_of_opcode_access_in_vocabulary branch from 5408737 to 80c29d9 Compare July 8, 2025 19:33
@svkeerthy svkeerthy force-pushed the users/svkeerthy/06-20-vocab_changes branch from 31590b9 to 70dcd29 Compare July 8, 2025 19:33
@svkeerthy svkeerthy marked this pull request as ready for review July 9, 2025 22:47
@llvmbot llvmbot added mlgo llvm:analysis Includes value tracking, cost tables and constant folding labels Jul 9, 2025
@llvmbot
Copy link
Member

llvmbot commented Jul 9, 2025

@llvm/pr-subscribers-llvm-analysis

@llvm/pr-subscribers-mlgo

Author: S. VenkataKeerthy (svkeerthy)

Changes

Refactored IR2Vec vocabulary handling to improve code organization and error handling. This would help in upcoming PRs related to the IR2Vec tool.

(Tracking issue - #141817)


Full diff: https://github.com/llvm/llvm-project/pull/147585.diff

2 Files Affected:

  • (modified) llvm/include/llvm/Analysis/IR2Vec.h (+6-3)
  • (modified) llvm/lib/Analysis/IR2Vec.cpp (+24-21)
diff --git a/llvm/include/llvm/Analysis/IR2Vec.h b/llvm/include/llvm/Analysis/IR2Vec.h
index f5a4e450cf160..176cdaf7b5378 100644
--- a/llvm/include/llvm/Analysis/IR2Vec.h
+++ b/llvm/include/llvm/Analysis/IR2Vec.h
@@ -163,15 +163,18 @@ class Vocabulary {
   static constexpr unsigned MaxOperandKinds =
       static_cast<unsigned>(OperandKind::MaxOperandKind);
 
+  /// Helper function to get vocabulary key for a given Opcode
+  static StringRef getVocabKeyForOpcode(unsigned Opcode);
+
+  /// Helper function to get vocabulary key for a given TypeID
+  static StringRef getVocabKeyForTypeID(Type::TypeID TypeID);
+
   /// Helper function to get vocabulary key for a given OperandKind
   static StringRef getVocabKeyForOperandKind(OperandKind Kind);
 
   /// Helper function to classify an operand into OperandKind
   static OperandKind getOperandKind(const Value *Op);
 
-  /// Helper function to get vocabulary key for a given TypeID
-  static StringRef getVocabKeyForTypeID(Type::TypeID TypeID);
-
 public:
   Vocabulary() = default;
   Vocabulary(VocabVector &&Vocab);
diff --git a/llvm/lib/Analysis/IR2Vec.cpp b/llvm/lib/Analysis/IR2Vec.cpp
index d3dc2e36fd23e..f97644b93a3d4 100644
--- a/llvm/lib/Analysis/IR2Vec.cpp
+++ b/llvm/lib/Analysis/IR2Vec.cpp
@@ -243,6 +243,17 @@ const ir2vec::Embedding &Vocabulary::operator[](const Value *Arg) const {
   return Vocab[MaxOpcodes + MaxTypeIDs + static_cast<unsigned>(ArgKind)];
 }
 
+StringRef Vocabulary::getVocabKeyForOpcode(unsigned Opcode) {
+  assert(Opcode >= 1 && Opcode <= MaxOpcodes && "Invalid opcode");
+#define HANDLE_INST(NUM, OPCODE, CLASS)                                        \
+  if (Opcode == NUM) {                                                         \
+    return #OPCODE;                                                            \
+  }
+#include "llvm/IR/Instruction.def"
+#undef HANDLE_INST
+  return "UnknownOpcode";
+}
+
 StringRef Vocabulary::getVocabKeyForTypeID(Type::TypeID TypeID) {
   switch (TypeID) {
   case Type::VoidTyID:
@@ -280,6 +291,7 @@ StringRef Vocabulary::getVocabKeyForTypeID(Type::TypeID TypeID) {
   default:
     return "UnknownTy";
   }
+  return "UnknownTy";
 }
 
 // Operand kinds supported by IR2Vec - string mappings
@@ -297,9 +309,9 @@ StringRef Vocabulary::getVocabKeyForOperandKind(Vocabulary::OperandKind Kind) {
     OPERAND_KINDS
 #undef OPERAND_KIND
   case Vocabulary::OperandKind::MaxOperandKind:
-    llvm_unreachable("Invalid OperandKind");
+    return "UnknownOperand";
   }
-  llvm_unreachable("Unknown OperandKind");
+  return "UnknownOperand";
 }
 
 #undef OPERAND_KINDS
@@ -332,14 +344,8 @@ StringRef Vocabulary::getStringKey(unsigned Pos) {
   assert(Pos < MaxOpcodes + MaxTypeIDs + MaxOperandKinds &&
          "Position out of bounds in vocabulary");
   // Opcode
-  if (Pos < MaxOpcodes) {
-#define HANDLE_INST(NUM, OPCODE, CLASS)                                        \
-  if (Pos == NUM - 1) {                                                        \
-    return #OPCODE;                                                            \
-  }
-#include "llvm/IR/Instruction.def"
-#undef HANDLE_INST
-  }
+  if (Pos < MaxOpcodes)
+    return getVocabKeyForOpcode(Pos + 1);
   // Type
   if (Pos < MaxOpcodes + MaxTypeIDs)
     return getVocabKeyForTypeID(static_cast<Type::TypeID>(Pos - MaxOpcodes));
@@ -447,21 +453,18 @@ void IR2VecVocabAnalysis::generateNumMappedVocab() {
   // Handle Opcodes
   std::vector<Embedding> NumericOpcodeEmbeddings(Vocabulary::MaxOpcodes,
                                                  Embedding(Dim, 0));
-#define HANDLE_INST(NUM, OPCODE, CLASS)                                        \
-  {                                                                            \
-    auto It = OpcVocab.find(#OPCODE);                                          \
-    if (It != OpcVocab.end())                                                  \
-      NumericOpcodeEmbeddings[NUM - 1] = It->second;                           \
-    else                                                                       \
-      handleMissingEntity(#OPCODE);                                            \
+  for (unsigned Opcode : seq(0u, Vocabulary::MaxOpcodes)) {
+    StringRef VocabKey = Vocabulary::getVocabKeyForOpcode(Opcode + 1);
+    auto It = OpcVocab.find(VocabKey.str());
+    if (It != OpcVocab.end())
+      NumericOpcodeEmbeddings[Opcode] = It->second;
+    else
+      handleMissingEntity(VocabKey.str());
   }
-#include "llvm/IR/Instruction.def"
-#undef HANDLE_INST
   Vocab.insert(Vocab.end(), NumericOpcodeEmbeddings.begin(),
                NumericOpcodeEmbeddings.end());
 
-  // Handle Types using direct iteration through TypeID enum
-  // We iterate through all possible TypeID values and map them to embeddings
+  // Handle Types
   std::vector<Embedding> NumericTypeEmbeddings(Vocabulary::MaxTypeIDs,
                                                Embedding(Dim, 0));
   for (unsigned TypeID : seq(0u, Vocabulary::MaxTypeIDs)) {

@svkeerthy svkeerthy force-pushed the users/svkeerthy/06-20-vocab_changes branch from 70dcd29 to 237e4d2 Compare July 14, 2025 17:40
@svkeerthy svkeerthy force-pushed the users/svkeerthy/07-07-_nfc_ir2vec_minor_refactoring_of_opcode_access_in_vocabulary branch from 80c29d9 to 5eaecce Compare July 14, 2025 17:40
Base automatically changed from users/svkeerthy/06-20-vocab_changes to main July 14, 2025 18:07
@svkeerthy svkeerthy force-pushed the users/svkeerthy/07-07-_nfc_ir2vec_minor_refactoring_of_opcode_access_in_vocabulary branch 2 times, most recently from bebdb9e to 28b3901 Compare July 14, 2025 20:45
Copy link
Contributor Author

svkeerthy commented Jul 14, 2025

Merge activity

  • Jul 14, 11:28 PM UTC: A user started a stack merge that includes this pull request via Graphite.
  • Jul 14, 11:30 PM UTC: Graphite rebased this pull request as part of a merge.
  • Jul 14, 11:33 PM UTC: Graphite rebased this pull request as part of a merge.
  • Jul 14, 11:35 PM UTC: @svkeerthy merged this pull request with Graphite.

@svkeerthy svkeerthy force-pushed the users/svkeerthy/07-07-_nfc_ir2vec_minor_refactoring_of_opcode_access_in_vocabulary branch from 28b3901 to ee12344 Compare July 14, 2025 23:29
@svkeerthy svkeerthy force-pushed the users/svkeerthy/07-07-_nfc_ir2vec_minor_refactoring_of_opcode_access_in_vocabulary branch from ee12344 to a466c28 Compare July 14, 2025 23:32
@svkeerthy svkeerthy merged commit 8ae8b50 into main Jul 14, 2025
7 of 9 checks passed
@svkeerthy svkeerthy deleted the users/svkeerthy/07-07-_nfc_ir2vec_minor_refactoring_of_opcode_access_in_vocabulary branch July 14, 2025 23:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:analysis Includes value tracking, cost tables and constant folding mlgo
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants