EgoNormia-Cosmos-Reason2-2B-v5-shortcot

Multi-task SFT fine-tune of nvidia/Cosmos-Reason2-2B on the EgoNormia social norm benchmark. This v5 variant keeps the same 3-task setup as v4, but compresses the reasoning traces into short 1-sentence CoT supervision.

Training

Parameter	Value
Base model	nvidia/Cosmos-Reason2-2B (Qwen3-VL-2B)
Tasks	Action + Justification + Sensibility (multi-task)
Train samples	4959
Training file	`data/egonormia_llava_shortcot_train.json`
CoT style	Short CoT, 1-sentence distilled traces
CoT length	median ~25 words (compressed from ~64 words)
Epochs	3
Global batch	64 (8 replicas x 8 per replica)
Learning rate	1e-5 (cosine decay, 3% warmup)
Context length	8192
Video input	`video_prev.mp4`, 8 frames
Hardware	8x A100-SXM4-80GB
Seed 1 run dir	`outputs/egonormia_sft/20260228141559/`
Seed 2 run dir	`outputs/egonormia_sft/20260301002022/`
Uploaded checkpoint	seed2 `step_150`

Evaluation (200 verified test samples)

Model	Action	Justification	Both	S-IoU
Zero-shot	58.5%	81.5%	51.0%	0.516
v3 best (`step_175`)	78.0%	97.0%	77.0%	0.664
v5 seed1 (`step_155`)	80.5%	95.5%	78.5%	0.618
v5 seed2 (`step_150`)	82.0%	95.5%	78.5%	0.634

Average over the two seed-wise best checkpoints:

Action: 81.25%
Justification: 95.5%
Both: 78.5%
S-IoU: 0.626

Robustness (option shuffle)

Checkpoint	Delta Action	Delta S-IoU	Sign test p	Verdict
seed1 `step_155`	-2.0pt	-0.035	0.585	pass
seed2 `step_150`	-5.0pt	-0.027	0.076	pass

Notes

v5 recovers the robustness lost in v4 while keeping stronger action accuracy than v3.
Best S-IoU still trails v3 (0.634 vs 0.664), so the gain is mainly in action / joint accuracy tradeoff rather than sensibility quality.
On this run family, explicit think-mode inference hurts performance: for seed2, no-think step_150 reaches 82.0% action / 78.5% both, while think mode peaks lower at 78.0% action / 72.5% both.

Usage

from transformers import AutoProcessor, Qwen3VLForConditionalGeneration

model = Qwen3VLForConditionalGeneration.from_pretrained(
    "robertzty/EgoNormia-Cosmos-Reason2-2B-v5-shortcot",
    torch_dtype="bfloat16",
    device_map="auto",
)
processor = AutoProcessor.from_pretrained("robertzty/EgoNormia-Cosmos-Reason2-2B-v5-shortcot")

Downloads last month: 2

Safetensors

Model size

2B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for robertzty/EgoNormia-Cosmos-Reason2-2B-v5-shortcot

Base model

Qwen/Qwen3-VL-2B-Instruct

Finetuned

nvidia/Cosmos-Reason2-2B

Finetuned

(9)

this model