Metis-8B-ColdStart

Act Wisely: Cultivating Meta-Cognitive Tool Use in Agentic Multimodal Models

Metis-8B-ColdStart is the SFT (Supervised Fine-Tuning) checkpoint of the Metis framework, fine-tuned from Qwen3-VL-8B-Instruct on the curated Metis-ColdStart dataset. This checkpoint serves as the starting point for HDPO reinforcement learning, which produces the final Metis-8B-RL model.

[Paper (arXiv)] | [GitHub] | [RL Model] | [ColdStart Data] | [RL Data]

Model Details

Attribute	Value
Base model	Qwen3-VL-8B-Instruct
Training stage	Supervised Fine-Tuning (Cold Start)
Training data	Metis-ColdStart (~27K samples)
Next stage	→ Metis-8B-RL (HDPO reinforcement learning)
License	Apache-2.0

Cold Start Data Curation Pipeline

The SFT corpus is curated from publicly available tool-augmented multimodal trajectories (DeepEyesV2, V-Interaction, Thyme, OpenMMReasoner) through a rigorous three-stage pipeline:

Eradicating hallucinated environmental dynamics — Execute all code in a sandbox environment; discard trajectories with execution failures.
Isolating genuine tool necessity — Filter out samples where the base model achieves pass@8 = 1 without any tools, ensuring only genuinely tool-dependent samples remain.
Multidimensional meta-cognitive filtering — An LLM judge evaluates visual relevance, reasoning coherence, and tool-use rationale to ensure high quality.

Training Pipeline

Qwen3-VL-8B-Instruct
        │
        ▼  SFT on Metis-ColdStart (~27K samples)
  Metis-8B-ColdStart  ← (this checkpoint)
        │
        ▼  HDPO on Metis-RL (~5K prompts)
   Metis-8B-RL  (final model)

Usage

Please refer to the GitHub repository for full installation and inference instructions.

Installation

git clone https://github.com/Accio-Lab/Metis.git
cd Metis
pip install -e verl
pip install -e ".[vllm,search_tool,python_code_dep]"

Citation

@article{yan2026metis,
  title={Act Wisely: Cultivating Meta-Cognitive Tool Use in Agentic Multimodal Models},
  author={Yan, Shilin and Tong, Jintao and Xue, Hongwei and Tang, Xiaojun and Wang, Yangyang and Shi, Kunyu and Zhang, Guannan and Li, Ruixuan and Zou, Yixiong},
  journal={arXiv preprint arXiv:2604.08545},
  year={2026}
}

Acknowledgments

Metis is built upon verl, verl-tool, and Qwen3-VL.

Downloads last month: 9

Safetensors

Model size

770k params

Tensor type

BF16

Model tree for Accio-Lab/Metis-8B-ColdStart

Base model

Qwen/Qwen3-VL-8B-Instruct

Finetuned

(225)

this model

Finetunes

1 model

Dataset used to train Accio-Lab/Metis-8B-ColdStart

Paper for Accio-Lab/Metis-8B-ColdStart

Act Wisely: Cultivating Meta-Cognitive Tool Use in Agentic Multimodal Models

Paper • 2604.08545 • Published 2 days ago • 27