daVinci-MagiHuman FP8 (E4M3)
Static FP8 (float8_e4m3fn) quantized weights for GAIR/daVinci-MagiHuman, a state-of-the-art audio-video generation model.
What's Included
| Directory | Description | Original Size | FP8 Size | Compression |
|---|---|---|---|---|
distill_fp8/ |
Distilled DiT model (8-step, no CFG) | 61.2 GB | 15.3 GB | 4.0x |
base_fp8/ |
Full base DiT model (32-step, CFG=2) | 30.6 GB | 15.3 GB | 2.0x |
540p_sr_fp8/ |
540p super-resolution DiT | 61.2 GB | 15.3 GB | 4.0x |
1080p_sr_fp8/ |
1080p super-resolution DiT | 61.2 GB | 15.3 GB | 4.0x |
Total savings: ~214 GB → ~61 GB on disk
Note: The distill and SR models store weights in FP32 on disk (hence the 4x compression), while the base model stores in BF16 (hence 2x). All have the same ~15.3 GB FP8 size and identical VRAM usage.
Quantization Details
- Method: Per-tensor absmax static quantization
- Weight dtype:
torch.float8_e4m3fnfor large linear layer weights (≥2D, ≥1M elements) - Scale dtype:
torch.float32(one scalar per weight tensor, stored asweight.__fp8_scale) - Preserved in BF16: Norms (RMSNorm), biases, embeddings, small tensors, and non-linear-layer weights
- Average relative quantization error: ~2.25%
- Inference: Dynamic per-tensor FP8 quantization of activations +
torch._scaled_mmfor FP8 tensor-core matmuls - Quantized tensors per model: 160 (all
BaseLinearandNativeMoELinearweight matrices)
VRAM Usage
With FP8 weights, the DiT model loads at ~14.3 GB (down from ~28.5 GB in BF16 / ~76.4 GB as reported from the FP32-on-disk originals). Combined with the text encoder, VAEs, and audio model, total VRAM is approximately:
- Base generation only: ~35 GB
- Base + 540p SR: ~49 GB
- Base + 1080p SR: ~49 GB
Performance
Tested on NVIDIA GB10 (Blackwell) with PyTorch 2.9.0+cu128:
- ~2.5x faster inference compared to BF16 (FP8 tensor core acceleration via
torch._scaled_mm) - Quality: Visually comparable to BF16 generation at ~2.25% weight quantization error
Usage
These FP8 weights are designed to be used with the modified inference code from daVinci-MagiHuman that includes FP8 BaseLinear support. The key changes needed in the inference code:
BaseLinear/NativeMoELinearindit_module.py: Add FP8 detection (weight.dtype == float8_e4m3fn) and route throughtorch._scaled_mmwith dynamic input quantization. Weight dimensions are padded to multiples of 16 at runtime for_scaled_mmcompatibility.- Checkpoint loader in
load_model_checkpoint.py: Remapkey.__fp8_scale→module.weight_scalebuffers, and re-install FP8 weight parameters afterload_state_dict(which auto-casts to BF16)
Quick Start
# Download just the distilled model (recommended for faster generation)
huggingface-cli download SanDiegoDude/daVinci-MagiHuman-FP8 \
--include "distill_fp8/*" \
--local-dir /path/to/models/daVinci-MagiHuman
# Download the full base model (higher quality, 32 steps)
huggingface-cli download SanDiegoDude/daVinci-MagiHuman-FP8 \
--include "base_fp8/*" \
--local-dir /path/to/models/daVinci-MagiHuman
# Download everything
huggingface-cli download SanDiegoDude/daVinci-MagiHuman-FP8 \
--local-dir /path/to/models/daVinci-MagiHuman
Other Required Models (not included)
You still need the following from the original repos:
- Turbo VAE:
GAIR/daVinci-MagiHuman(turbo_vae/) - Wan2.2 VAE:
Wan-AI/Wan2.2-TI2V-5B(Wan2.2_VAE.pth) - Stable Audio:
stabilityai/stable-audio-open-1.0 - T5-Gemma:
google/t5gemma-9b-9b-ul2
Conversion Script
The FP8 models were created using convert_fp8.py:
# Convert the distilled model
python convert_fp8.py \
--input /path/to/daVinci-MagiHuman/distill \
--output /path/to/daVinci-MagiHuman/distill_fp8
# Convert the full base model
python convert_fp8.py \
--input /path/to/daVinci-MagiHuman/base \
--output /path/to/daVinci-MagiHuman/base_fp8
The script is available in the daVinci-MagiHuman repository.
Credits
- Original model: GAIR-NLP / daVinci-MagiHuman
- Paper: daVinci: A Joint Audio-Video Generation Model with Spatio-Temporal Consistency
- FP8 quantization: SanDiegoDude
License
Apache 2.0 (same as the original model)
Model tree for SanDiegoDude/daVinci-MagiHuman-FP8
Base model
GAIR/daVinci-MagiHuman