Skip to content

🐛 [Bug] perf gap reduce on timm/ViT #3704

@zewenli98

Description

@zewenli98

Bug Description

Compare the perf of Torch-TRT against ONNX-TRT:

In fp16:

  1. Skipping constant folding of embedding layers doesn't affect engine size or latency or precision
  2. Disabling linear decomposition + adding linear converter reduces ~15% latency
  3. opt_level=3 or 5 get almost same latency
  4. onnx-trt takes much longer in compile time
  5. torch-trt is ~9% slower than onnx-trt

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions