Metis-8B-ColdStart

Act Wisely: Cultivating Meta-Cognitive Tool Use in Agentic Multimodal Models

Metis-8B-ColdStart is the SFT (Supervised Fine-Tuning) checkpoint of the Metis framework, fine-tuned from Qwen3-VL-8B-Instruct on the curated Metis-ColdStart dataset. This checkpoint serves as the starting point for HDPO reinforcement learning, which produces the final Metis-8B-RL model.

[Paper (arXiv)] | [GitHub] | [RL Model] | [ColdStart Data] | [RL Data]

Model Details

Attribute Value
Base model Qwen3-VL-8B-Instruct
Training stage Supervised Fine-Tuning (Cold Start)
Training data Metis-ColdStart (~27K samples)
Next stage Metis-8B-RL (HDPO reinforcement learning)
License Apache-2.0

Cold Start Data Curation Pipeline

The SFT corpus is curated from publicly available tool-augmented multimodal trajectories (DeepEyesV2, V-Interaction, Thyme, OpenMMReasoner) through a rigorous three-stage pipeline:

  1. Eradicating hallucinated environmental dynamics — Execute all code in a sandbox environment; discard trajectories with execution failures.
  2. Isolating genuine tool necessity — Filter out samples where the base model achieves pass@8 = 1 without any tools, ensuring only genuinely tool-dependent samples remain.
  3. Multidimensional meta-cognitive filtering — An LLM judge evaluates visual relevance, reasoning coherence, and tool-use rationale to ensure high quality.

Training Pipeline

Qwen3-VL-8B-Instruct
        │
        ▼  SFT on Metis-ColdStart (~27K samples)
  Metis-8B-ColdStart  ← (this checkpoint)
        │
        ▼  HDPO on Metis-RL (~5K prompts)
   Metis-8B-RL  (final model)

Usage

Please refer to the GitHub repository for full installation and inference instructions.

Installation

git clone https://github.com/Accio-Lab/Metis.git
cd Metis
pip install -e verl
pip install -e ".[vllm,search_tool,python_code_dep]"

Citation

@article{yan2026metis,
  title={Act Wisely: Cultivating Meta-Cognitive Tool Use in Agentic Multimodal Models},
  author={Yan, Shilin and Tong, Jintao and Xue, Hongwei and Tang, Xiaojun and Wang, Yangyang and Shi, Kunyu and Zhang, Guannan and Li, Ruixuan and Zou, Yixiong},
  journal={arXiv preprint arXiv:2604.08545},
  year={2026}
}

Acknowledgments

Metis is built upon verl, verl-tool, and Qwen3-VL.

Downloads last month
9
Safetensors
Model size
770k params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Accio-Lab/Metis-8B-ColdStart

Finetuned
(225)
this model
Finetunes
1 model

Dataset used to train Accio-Lab/Metis-8B-ColdStart

Paper for Accio-Lab/Metis-8B-ColdStart