Instructions to use two-tiger/MiMo-VRPRM-7B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use two-tiger/MiMo-VRPRM-7B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="two-tiger/MiMo-VRPRM-7B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("two-tiger/MiMo-VRPRM-7B")
model = AutoModelForImageTextToText.from_pretrained("two-tiger/MiMo-VRPRM-7B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use two-tiger/MiMo-VRPRM-7B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "two-tiger/MiMo-VRPRM-7B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "two-tiger/MiMo-VRPRM-7B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/two-tiger/MiMo-VRPRM-7B

SGLang

How to use two-tiger/MiMo-VRPRM-7B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "two-tiger/MiMo-VRPRM-7B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "two-tiger/MiMo-VRPRM-7B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "two-tiger/MiMo-VRPRM-7B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "two-tiger/MiMo-VRPRM-7B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use two-tiger/MiMo-VRPRM-7B with Docker Model Runner:
```
docker model run hf.co/two-tiger/MiMo-VRPRM-7B
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

VRPRM-MiMo-7B

VRPRM-MiMo-7B is a visual process reward model from VRPRM: Process Reward Modeling via Visual Reasoning.

VRPRM is designed to evaluate intermediate reasoning steps for multimodal problems. The model is intended for visual process reward modeling, reasoning-step scoring, and Best-of-N selection for vision-language model outputs.

Model Details

Model family: VRPRM
Release variant: MiMo-7B
Serialized architecture: Qwen2_5_VLForConditionalGeneration
Model type: qwen2_5_vl
Weights format: sharded safetensors
Recommended library: transformers

Training Summary

The VRPRM paper trains the model with a two-stage recipe:

Supervised fine-tuning cold start on high-quality CoT-PRM data.
Reinforcement learning scaling on lower-cost non-CoT PRM data.

The release data is derived from VisualPRM400K-style process supervision.

Intended Use

This model is intended for research on:

Visual process reward modeling
Multimodal reasoning evaluation
Step-level scoring of visual question answering rationales
Best-of-N selection for vision-language model responses

This model is not intended to be used as a standalone assistant.

Usage

Load the model with Hugging Face Transformers from the repository root:

from transformers import AutoModelForVision2Seq, AutoProcessor

model_id = "YOUR_USERNAME/VRPRM-MiMo-7B"

processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForVision2Seq.from_pretrained(
    model_id,
    torch_dtype="auto",
    device_map="auto",
    trust_remote_code=True,
)

For the complete inference and evaluation pipeline, use the VRPRM project code.

Limitations

Reward scores depend on the quality of the generated visual reasoning process.
Generated reasoning introduces higher latency than direct scalar reward modeling.
The model may inherit biases from its base model and process supervision data.
Evaluation should be performed on task-specific validation sets before deployment.

Citation

@article{vrprm2026,
  title={VRPRM: Process Reward Modeling via Visual Reasoning},
  author={Chen, Xinquan and Yue, Chongying and Liu, Bangwei and Wang, Xuhong and Wang, Yingchun and Lu, Chaochao},
  year={2026}
}

Downloads last month: 10

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for two-tiger/MiMo-VRPRM-7B

Quantizations

1 model

Collection including two-tiger/MiMo-VRPRM-7B

VRPRM

Collection

VRPRM: Process Reward Modeling via Visual Reasoning • 4 items • Updated 1 day ago • 1