Benchmark AWQ and SmoothQuant within vLLM ecosystem

### Overview
SmoothQuant and AWQ operate using a similar intuition, scaling up weights and scaling down activations. Therefore, benchmarks can share many logics within the vLLM ecosystem, using `vllm/lm-eval` and `vllm/benchmarks/benchmark_latency`.


Ideally, the following user guide and docs can be updated focusing on vLLM: 
- AWQ: `torchao/prototype/awq/example.py & torchao/prototype/awq/README.md`
- SmoothQuant: `torchao/prototype/smoothquant/example.py & torchao/prototype/smoothquant/README.md`

### Related Issue/PR
 - https://github.com/pytorch/ao/pull/2728#discussion_r2285972094 
 - https://github.com/mit-han-lab/llm-awq/issues/130

### Resources
- https://blog.squeezebits.com/vllm-vs-tensorrtllm-2-towards-optimal-batching-for-llm-serving-31349
- https://vllm-ascend.readthedocs.io/en/latest/developer_guide/evaluation/using_lm_eval.html
- https://github.com/vllm-project/vllm/blob/main/benchmarks/benchmark_latency.py
- https://github.com/vllm-project/vllm/discussions/7181

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Benchmark AWQ and SmoothQuant within vLLM ecosystem #2815

Overview

Related Issue/PR

Resources

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Benchmark AWQ and SmoothQuant within vLLM ecosystem #2815

Description

Overview

Related Issue/PR

Resources

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions