daVinci-MagiHuman FP8 (E4M3)

Static FP8 (float8_e4m3fn) quantized weights for GAIR/daVinci-MagiHuman, a state-of-the-art audio-video generation model.

What's Included

Directory Description Original Size FP8 Size Compression
distill_fp8/ Distilled DiT model (8-step, no CFG) 61.2 GB 15.3 GB 4.0x
base_fp8/ Full base DiT model (32-step, CFG=2) 30.6 GB 15.3 GB 2.0x
540p_sr_fp8/ 540p super-resolution DiT 61.2 GB 15.3 GB 4.0x
1080p_sr_fp8/ 1080p super-resolution DiT 61.2 GB 15.3 GB 4.0x

Total savings: ~214 GB → ~61 GB on disk

Note: The distill and SR models store weights in FP32 on disk (hence the 4x compression), while the base model stores in BF16 (hence 2x). All have the same ~15.3 GB FP8 size and identical VRAM usage.

Quantization Details

  • Method: Per-tensor absmax static quantization
  • Weight dtype: torch.float8_e4m3fn for large linear layer weights (≥2D, ≥1M elements)
  • Scale dtype: torch.float32 (one scalar per weight tensor, stored as weight.__fp8_scale)
  • Preserved in BF16: Norms (RMSNorm), biases, embeddings, small tensors, and non-linear-layer weights
  • Average relative quantization error: ~2.25%
  • Inference: Dynamic per-tensor FP8 quantization of activations + torch._scaled_mm for FP8 tensor-core matmuls
  • Quantized tensors per model: 160 (all BaseLinear and NativeMoELinear weight matrices)

VRAM Usage

With FP8 weights, the DiT model loads at ~14.3 GB (down from ~28.5 GB in BF16 / ~76.4 GB as reported from the FP32-on-disk originals). Combined with the text encoder, VAEs, and audio model, total VRAM is approximately:

  • Base generation only: ~35 GB
  • Base + 540p SR: ~49 GB
  • Base + 1080p SR: ~49 GB

Performance

Tested on NVIDIA GB10 (Blackwell) with PyTorch 2.9.0+cu128:

  • ~2.5x faster inference compared to BF16 (FP8 tensor core acceleration via torch._scaled_mm)
  • Quality: Visually comparable to BF16 generation at ~2.25% weight quantization error

Usage

These FP8 weights are designed to be used with the modified inference code from daVinci-MagiHuman that includes FP8 BaseLinear support. The key changes needed in the inference code:

  1. BaseLinear / NativeMoELinear in dit_module.py: Add FP8 detection (weight.dtype == float8_e4m3fn) and route through torch._scaled_mm with dynamic input quantization. Weight dimensions are padded to multiples of 16 at runtime for _scaled_mm compatibility.
  2. Checkpoint loader in load_model_checkpoint.py: Remap key.__fp8_scale → module.weight_scale buffers, and re-install FP8 weight parameters after load_state_dict (which auto-casts to BF16)

Quick Start

# Download just the distilled model (recommended for faster generation)
huggingface-cli download SanDiegoDude/daVinci-MagiHuman-FP8 \
    --include "distill_fp8/*" \
    --local-dir /path/to/models/daVinci-MagiHuman

# Download the full base model (higher quality, 32 steps)
huggingface-cli download SanDiegoDude/daVinci-MagiHuman-FP8 \
    --include "base_fp8/*" \
    --local-dir /path/to/models/daVinci-MagiHuman

# Download everything
huggingface-cli download SanDiegoDude/daVinci-MagiHuman-FP8 \
    --local-dir /path/to/models/daVinci-MagiHuman

Other Required Models (not included)

You still need the following from the original repos:

  • Turbo VAE: GAIR/daVinci-MagiHuman (turbo_vae/)
  • Wan2.2 VAE: Wan-AI/Wan2.2-TI2V-5B (Wan2.2_VAE.pth)
  • Stable Audio: stabilityai/stable-audio-open-1.0
  • T5-Gemma: google/t5gemma-9b-9b-ul2

Conversion Script

The FP8 models were created using convert_fp8.py:

# Convert the distilled model
python convert_fp8.py \
    --input /path/to/daVinci-MagiHuman/distill \
    --output /path/to/daVinci-MagiHuman/distill_fp8

# Convert the full base model
python convert_fp8.py \
    --input /path/to/daVinci-MagiHuman/base \
    --output /path/to/daVinci-MagiHuman/base_fp8

The script is available in the daVinci-MagiHuman repository.

Credits

License

Apache 2.0 (same as the original model)

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for SanDiegoDude/daVinci-MagiHuman-FP8

Finetuned
(2)
this model