Skip to content

Conversation

@Roman-Pevnyi
Copy link

@Roman-Pevnyi Roman-Pevnyi commented Jul 30, 2025

Summary

Currently, UniformQuantizedType only supports built-in MLIR storage types such as Integer. LLM quantization research introducing feature of using NF4 as a low precision datatype (see https://arxiv.org/pdf/2305.14314). There is a growing need to make the system extensible and maintainable as more types are added. Ensuring that MLIR can natively support NF4 through a clean, extensible interface is essential for both current and future quantization workflows.

Current Approach and Its Limitations:

  • The present implementation relies on dynamic checks (e.g., type switches or if-else chains) to determine the storage type and retrieve type-specific information for legality checks.
  • This approach works for a small, fixed set of types, but as the number of supported types grows, the code becomes harder to read, maintain, and extend.

Proposed Interface-Based Approach:

  • Define a StorageTypeInterface that specifies the required methods any storage type must implement to be used in UniformQuantizedType.
  • Each storage type (Integer, Float8E5M2, Float8E4M3FN, and new types like NF4) would implement this interface, encapsulating their type-specific logic.
  • When UniformQuantizedType needs to check legality or retrieve information, it can use MLIR’s dyn_cast mechanism to check if the type implements the interface and then call the required methods.
  • This design decouples UniformQuantizedType from the specifics of each storage type, making it easy to add new types (such as NF4) without modifying the core logic or introducing more type checks.

Benefits:

  • Extensibility: New storage types can be added by simply implementing the interface, without touching the core UniformQuantizedType logic.
  • Readability: The code is cleaner, as it avoids large switch statements or if-else chains.
  • Maintainability: Type-specific logic is encapsulated within each type, reducing the risk of errors and making the codebase easier to understand and update.

int64_t getDefaultMinimum(bool isSigned, unsigned integralWidth) const {
return -getDefaultMaximum(isSigned, integralWidth);
}
std::string printStorageType([[maybe_unused]] bool isSigned, [[maybe_unused]] unsigned storageWidth) const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure we should have print method here. Isn't there more canonical way to stringify type name?
It's more getStorageTypeName

}
return llvm::maxUIntN(integralWidth);
}
std::string printStorageType(bool isSigned, unsigned storageWidth) const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to pass these argument from outside?
Isn't there some existing interface we can use here to get these directly from this?

…ck) (intel#142)

Add a new API to access all blobs that are stored in the blob manager.
The main purpose (as of now) is to allow users of dialect resources to
iterate over all blobs, especially when the blobs are no longer used in
IR (e.g. the operation that uses the blob is deleted) and thus cannot be
easily accessed without manual tracking of keys.
@Roman-Pevnyi Roman-Pevnyi force-pushed the EISW-158454_refactor_palletization branch from d6153a3 to 9ad9c9f Compare July 31, 2025 14:13
Copy link
Contributor

@ZoranZomborat ZoranZomborat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proposal looks great! Let's also get some feedback from LLVM discourse and motivate with our cases;

@Roman-Pevnyi Roman-Pevnyi changed the title [EISW-158454] Refactor palletization to use quant.uniform instead of quant.quantile Extending UniformQuantizedType with interface-based support for new storage types in Quant dialect Aug 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants