Wan2.2-T2V-A14B ModelOpt FP8 Transformer Override for SGLang

This repository contains an SGLang-ready FP8 override for the primary transformer of Wan-AI/Wan2.2-T2V-A14B-Diffusers.

Important scope note:

  • base model: Wan-AI/Wan2.2-T2V-A14B-Diffusers
  • quantized component in this repo: primary transformer
  • transformer_2 remains BF16 from the base model
  • intended usage: SGLang --transformer-path

Example:

PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True         sglang generate           --model-path Wan-AI/Wan2.2-T2V-A14B-Diffusers           --transformer-path BBuf/wan22-t2v-a14b-modelopt-fp8-sglang-transformer           --prompt "A cat and a dog baking a cake together in a kitchen."           --720p --num-frames 81 --seed 42 --num-gpus 4           --enable-cfg-parallel --ulysses-degree 2           --text-encoder-cpu-offload --pin-cpu-memory           --dit-cpu-offload false           --dit-layerwise-offload false           --save-output

Validated notes:

  • H100 exact-nightly parity benchmark on 4 GPUs with no torch.compile
  • benchmarked path keeps transformer_2 in BF16 from the base checkpoint
  • exact-nightly benchmark: about 3.68% total speedup and 3.83% denoise speedup versus BF16
  • reduced-smoke trajectory validation: selected-step latent cosine about 0.9755
Downloads last month
124
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support