daVinci-MagiHuman FP8 (E4M3)

Static FP8 (float8_e4m3fn) quantized weights for GAIR/daVinci-MagiHuman, a state-of-the-art audio-video generation model.

What's Included

Directory	Description	Original Size	FP8 Size	Compression
`distill_fp8/`	Distilled DiT model (8-step, no CFG)	61.2 GB	15.3 GB	4.0x
`base_fp8/`	Full base DiT model (32-step, CFG=2)	30.6 GB	15.3 GB	2.0x
`540p_sr_fp8/`	540p super-resolution DiT	61.2 GB	15.3 GB	4.0x
`1080p_sr_fp8/`	1080p super-resolution DiT	61.2 GB	15.3 GB	4.0x

Total savings: ~214 GB → ~61 GB on disk

Note: The distill and SR models store weights in FP32 on disk (hence the 4x compression), while the base model stores in BF16 (hence 2x). All have the same ~15.3 GB FP8 size and identical VRAM usage.

Quantization Details

Method: Per-tensor absmax static quantization
Weight dtype: torch.float8_e4m3fn for large linear layer weights (≥2D, ≥1M elements)
Scale dtype: torch.float32 (one scalar per weight tensor, stored as weight.__fp8_scale)
Preserved in BF16: Norms (RMSNorm), biases, embeddings, small tensors, and non-linear-layer weights
Average relative quantization error: ~2.25%
Inference: Dynamic per-tensor FP8 quantization of activations + torch._scaled_mm for FP8 tensor-core matmuls
Quantized tensors per model: 160 (all BaseLinear and NativeMoELinear weight matrices)

VRAM Usage

With FP8 weights, the DiT model loads at ~14.3 GB (down from ~28.5 GB in BF16 / ~76.4 GB as reported from the FP32-on-disk originals). Combined with the text encoder, VAEs, and audio model, total VRAM is approximately:

Base generation only: ~35 GB
Base + 540p SR: ~49 GB
Base + 1080p SR: ~49 GB

Performance

Tested on NVIDIA GB10 (Blackwell) with PyTorch 2.9.0+cu128:

~2.5x faster inference compared to BF16 (FP8 tensor core acceleration via torch._scaled_mm)
Quality: Visually comparable to BF16 generation at ~2.25% weight quantization error

Usage

These FP8 weights are designed to be used with the modified inference code from daVinci-MagiHuman that includes FP8 BaseLinear support. The key changes needed in the inference code:

BaseLinear / NativeMoELinear in dit_module.py: Add FP8 detection (weight.dtype == float8_e4m3fn) and route through torch._scaled_mm with dynamic input quantization. Weight dimensions are padded to multiples of 16 at runtime for _scaled_mm compatibility.
Checkpoint loader in load_model_checkpoint.py: Remap key.__fp8_scale → module.weight_scale buffers, and re-install FP8 weight parameters after load_state_dict (which auto-casts to BF16)

Quick Start

# Download just the distilled model (recommended for faster generation)
huggingface-cli download SanDiegoDude/daVinci-MagiHuman-FP8 \
    --include "distill_fp8/*" \
    --local-dir /path/to/models/daVinci-MagiHuman

# Download the full base model (higher quality, 32 steps)
huggingface-cli download SanDiegoDude/daVinci-MagiHuman-FP8 \
    --include "base_fp8/*" \
    --local-dir /path/to/models/daVinci-MagiHuman

# Download everything
huggingface-cli download SanDiegoDude/daVinci-MagiHuman-FP8 \
    --local-dir /path/to/models/daVinci-MagiHuman

Other Required Models (not included)

You still need the following from the original repos:

Turbo VAE: GAIR/daVinci-MagiHuman (turbo_vae/)
Wan2.2 VAE: Wan-AI/Wan2.2-TI2V-5B (Wan2.2_VAE.pth)
Stable Audio: stabilityai/stable-audio-open-1.0
T5-Gemma: google/t5gemma-9b-9b-ul2

Conversion Script

The FP8 models were created using convert_fp8.py:

# Convert the distilled model
python convert_fp8.py \
    --input /path/to/daVinci-MagiHuman/distill \
    --output /path/to/daVinci-MagiHuman/distill_fp8

# Convert the full base model
python convert_fp8.py \
    --input /path/to/daVinci-MagiHuman/base \
    --output /path/to/daVinci-MagiHuman/base_fp8

The script is available in the daVinci-MagiHuman repository.

Credits

Original model: GAIR-NLP / daVinci-MagiHuman
Paper: daVinci: A Joint Audio-Video Generation Model with Spatio-Temporal Consistency
FP8 quantization: SanDiegoDude

License

Apache 2.0 (same as the original model)

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for SanDiegoDude/daVinci-MagiHuman-FP8

Base model

GAIR/daVinci-MagiHuman

Finetuned

(2)

this model