Skip to content

Support weight only quantization from bfloat16 to int8? #110

@Missmiaom

Description

@Missmiaom

An error occurred while quantifying the bf16 model:

tensorrt_llm_0.5.0/examples/gpt/weight.py:265

 elif use_weight_only:
                processed_torch_weights, torch_weight_scales = torch.ops.fastertransformer.symmetric_quantize_last_axis_of_batched_matrix(
                    torch.tensor(t), plugin_weight_only_quant_type)

error:

can't convert np.ndarray of type numpy.void. The only supported types are: float64, float32, float16, complex64, complex128, int64, int32, int16, int8, uint8, and bool.

Metadata

Metadata

Labels

feature requestNew feature or request. This includes new model, dtype, functionality supporttriagedIssue has been triaged by maintainers

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions