VRPRM-MiMo-7B

VRPRM-MiMo-7B is a visual process reward model from VRPRM: Process Reward Modeling via Visual Reasoning.

VRPRM is designed to evaluate intermediate reasoning steps for multimodal problems. The model is intended for visual process reward modeling, reasoning-step scoring, and Best-of-N selection for vision-language model outputs.

Model Details

  • Model family: VRPRM
  • Release variant: MiMo-7B
  • Serialized architecture: Qwen2_5_VLForConditionalGeneration
  • Model type: qwen2_5_vl
  • Weights format: sharded safetensors
  • Recommended library: transformers

Training Summary

The VRPRM paper trains the model with a two-stage recipe:

  1. Supervised fine-tuning cold start on high-quality CoT-PRM data.
  2. Reinforcement learning scaling on lower-cost non-CoT PRM data.

The release data is derived from VisualPRM400K-style process supervision.

Intended Use

This model is intended for research on:

  • Visual process reward modeling
  • Multimodal reasoning evaluation
  • Step-level scoring of visual question answering rationales
  • Best-of-N selection for vision-language model responses

This model is not intended to be used as a standalone assistant.

Usage

Load the model with Hugging Face Transformers from the repository root:

from transformers import AutoModelForVision2Seq, AutoProcessor

model_id = "YOUR_USERNAME/VRPRM-MiMo-7B"

processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForVision2Seq.from_pretrained(
    model_id,
    torch_dtype="auto",
    device_map="auto",
    trust_remote_code=True,
)

For the complete inference and evaluation pipeline, use the VRPRM project code.

Limitations

  • Reward scores depend on the quality of the generated visual reasoning process.
  • Generated reasoning introduces higher latency than direct scalar reward modeling.
  • The model may inherit biases from its base model and process supervision data.
  • Evaluation should be performed on task-specific validation sets before deployment.

Citation

@article{vrprm2026,
  title={VRPRM: Process Reward Modeling via Visual Reasoning},
  author={Chen, Xinquan and Yue, Chongying and Liu, Bangwei and Wang, Xuhong and Wang, Yingchun and Lu, Chaochao},
  year={2026}
}
Downloads last month
10
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for two-tiger/MiMo-VRPRM-7B

Quantizations
1 model

Collection including two-tiger/MiMo-VRPRM-7B