Skip to content

Conversation

@matichon-vultureprime
Copy link
Contributor

@matichon-vultureprime matichon-vultureprime commented May 25, 2024

According to [Feature Request: "Model Zoo" for quantization #1591],
this is our initial effort to create the Model Zoo.
The first model uploaded is Llama3-70b, AWQ Quantized.

I have identified several opportunities within the Model Zoo. I encountered a variety of configurations including PP_size, TP_size, KV_cache_type (fp16, fp8, int8), Group_size (64, 128), and Quantization algorithms (AWQ, SQ, FP8).
I will try to figure out the "proper" base configuration.

I have decided to use the lowest possible Group_size (to prevent the degradation of quantization) and set PP_size to 1.

Let's discuss if we can determine the "proper" configurations.

@byshiue
Copy link
Collaborator

byshiue commented May 28, 2024

Thank you for the PR. We will merge it soon.

@byshiue byshiue self-requested a review May 28, 2024 01:07
@byshiue byshiue self-assigned this May 28, 2024
@byshiue byshiue added triaged Issue has been triaged by maintainers Community want to contribute PRs initiated from Community labels May 28, 2024
@nv-guomingz
Copy link
Collaborator

Hi @matichon-vultureprime ,thanks for your contributing. We've merged your contribution into code base and will add you into contributor list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Community want to contribute PRs initiated from Community Merged triaged Issue has been triaged by maintainers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants