Wan2.2-T2V-A14B ModelOpt FP8 Transformer Override for SGLang
This repository contains an SGLang-ready FP8 override for the primary transformer of Wan-AI/Wan2.2-T2V-A14B-Diffusers.
Important scope note:
- base model:
Wan-AI/Wan2.2-T2V-A14B-Diffusers - quantized component in this repo: primary
transformer transformer_2remains BF16 from the base model- intended usage: SGLang
--transformer-path
Example:
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True sglang generate --model-path Wan-AI/Wan2.2-T2V-A14B-Diffusers --transformer-path BBuf/wan22-t2v-a14b-modelopt-fp8-sglang-transformer --prompt "A cat and a dog baking a cake together in a kitchen." --720p --num-frames 81 --seed 42 --num-gpus 4 --enable-cfg-parallel --ulysses-degree 2 --text-encoder-cpu-offload --pin-cpu-memory --dit-cpu-offload false --dit-layerwise-offload false --save-output
Validated notes:
- H100 exact-nightly parity benchmark on 4 GPUs with no torch.compile
- benchmarked path keeps
transformer_2in BF16 from the base checkpoint - exact-nightly benchmark: about 3.68% total speedup and 3.83% denoise speedup versus BF16
- reduced-smoke trajectory validation: selected-step latent cosine about 0.9755
- Downloads last month
- 124
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support