Instructions to use hustvl/DiffusionVL-Qwen2.5-7B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use hustvl/DiffusionVL-Qwen2.5-7B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="hustvl/DiffusionVL-Qwen2.5-7B", trust_remote_code=True)
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("hustvl/DiffusionVL-Qwen2.5-7B", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use hustvl/DiffusionVL-Qwen2.5-7B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "hustvl/DiffusionVL-Qwen2.5-7B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "hustvl/DiffusionVL-Qwen2.5-7B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/hustvl/DiffusionVL-Qwen2.5-7B

SGLang

How to use hustvl/DiffusionVL-Qwen2.5-7B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "hustvl/DiffusionVL-Qwen2.5-7B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "hustvl/DiffusionVL-Qwen2.5-7B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "hustvl/DiffusionVL-Qwen2.5-7B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "hustvl/DiffusionVL-Qwen2.5-7B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use hustvl/DiffusionVL-Qwen2.5-7B with Docker Model Runner:
```
docker model run hf.co/hustvl/DiffusionVL-Qwen2.5-7B
```

DiffusionVL: Translating Any Autoregressive Models into
Diffusion Vision Language Models

SOTA dVLM Performance with <5% Data & 2.0× Inference Speedup!

Lunbin Zeng^1,*, Jingfeng Yao^1,*, Bencheng Liao¹, Hongyuan Tao¹, Wenyu Liu¹, Xinggang Wang^{1, ✉️}

¹Huazhong University of Science and Technology

^*equal contribution, ^✉️corresponding author, xgwang@hust.edu.cn

📰 News

[2025.12.25] 🎄 We have completed our release plan ahead of schedule. DiffusionVL is now fully open-sourced. Merry Christmas to the community!
[2025.12.18] 🎉 Our paper DiffusionVL is released on arXiv! We also release the DiffusionVL models translated from Qwen2.5VL on Hugging Face.

🚀 Release Plan

Release paper
Release DiffusionVL model weights (translated from AR-VLMs)
Release DiffusionVL model weights (translated from AR-LMs)
Release evaluation code
Release training code

📄 Introduction

The diffusion paradigm has emerged as a promising alternative to autoregressive (AR) models, offering the potential for efficient parallel decoding. However, existing diffusion vision language models (dVLMs) largely lag behind mainstream autoregressive vision language models in performance, primarily due to the capability limitations of their base diffusion language models.

DiffusionVL bridges this gap by answering a fundamental question: Can we directly translate any existing autoregressive models into powerful diffusion vision language models? We propose a diffusion finetuning framework that "translates" any pretrained AR model into a diffusion vision language model through a simple paradigm shift and modality shift. Unlike prior dVLMs restricted by fixed generation lengths, DiffusionVL introduces a novel block decoding strategy. This allows for arbitrary-length generation and KV-cache reuse. With this integrated design, despite training with less than 5% of the training data required by previous methods, DiffusionVL translated from AR-VLMs achieves a state-of-the-art performance among exsiting dVLMs and delivers a 2.0× inference speedup.

✨ Highlights

Universal Translation Framework: Translate any AR models into dVLMs with a simple yet effective approach.
Superior Performance: Achieve SOTA dVLM performance using <5% training data (738K vs 16.5M samples).
2.0× Faster Inference: Block decoding strategy enables KV-cache reuse and 2.0× speedup over previous dVLMs.

🚀 Get Started

Document	Description
Installation	Environment setup, data and model preparation
Training & Evaluation	Train and evaluate DiffusionVL models
Inference	Quick inference with pre-trained models

❤️ Acknowledgements

This repo is mainly built on Qwen2.5-VL, LLaDA-V, BD3LMs and SDAR, lmms-eval. We thank the authors for their open-source contributions.

📝 Citation

If you find our work useful, please cite our paper:

@misc{zeng2025diffusionvltranslatingautoregressivemodels,
      title={DiffusionVL: Translating Any Autoregressive Models into Diffusion Vision Language Models},
      author={Lunbin Zeng and Jingfeng Yao and Bencheng Liao and Hongyuan Tao and Wenyu Liu and Xinggang Wang},
      year={2025},
      eprint={2512.15713},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2512.15713},
}

Downloads last month: 12

Safetensors

Model size

8B params

Tensor type

BF16

Collection including hustvl/DiffusionVL-Qwen2.5-7B

DiffusionVL

Collection

4 items • Updated Dec 25, 2025 • 4

Paper for hustvl/DiffusionVL-Qwen2.5-7B

DiffusionVL: Translating Any Autoregressive Models into Diffusion Vision Language Models

Paper • 2512.15713 • Published Dec 17, 2025 • 18