pip3 install torch==2.9.0 # >= 2.7.1
pip3 install -U "cache-dit[all]" # >= 1.0.9
pip3 install git+https://github.com/huggingface/diffusers.git # latest mainWe have release a Hybrid Acceleration example (📚qwen_image_fast.py) with 4.8x🎉 speedup in this repo for Qwen-Image, feel free to take a try (Hybrid Cache Acceleration + Context Parallelism + FP8 Weight Only + Torch Compile). For example:
# Baseline (NVIDIA L20 48GiB, ~120s w/ Model CPU Offload)
python3 qwen_image_fast.py --height 1024 --width 1024
# + (DBCache + TaylorSeer) 
# + Context Parallelism (Ulysses)
# + FP8 Weight Only (not require offload anymore) 
# + Torch Compile (NVIDIA L20x2, ~25s, ~4.8x speedup)
torchrun --nproc_per_node=2 qwen_image_fast.py \
         --height 1024 --width 1024 \
         --parallel-type ulysses --quantize \
         --cache --Fn 1 --rdt 0.12 --mcc 2 --taylorseer \
         --compile| 🤖Baseline w/o Acceleration | 🎉w/ Hybrid Acceleration | 
|---|---|
| ~120s, 60+ GiB per GPU | ~25s, ~4.8x speedup, 36 GiB per GPU | 
|  |  | 
This repo is based on cache-dit and diffusers. Many thanks to these awesome open-source projects.