EgoNormia-Cosmos-Reason2-2B-v5-shortcot

Multi-task SFT fine-tune of nvidia/Cosmos-Reason2-2B on the EgoNormia social norm benchmark. This v5 variant keeps the same 3-task setup as v4, but compresses the reasoning traces into short 1-sentence CoT supervision.

Training

Parameter Value
Base model nvidia/Cosmos-Reason2-2B (Qwen3-VL-2B)
Tasks Action + Justification + Sensibility (multi-task)
Train samples 4959
Training file data/egonormia_llava_shortcot_train.json
CoT style Short CoT, 1-sentence distilled traces
CoT length median ~25 words (compressed from ~64 words)
Epochs 3
Global batch 64 (8 replicas x 8 per replica)
Learning rate 1e-5 (cosine decay, 3% warmup)
Context length 8192
Video input video_prev.mp4, 8 frames
Hardware 8x A100-SXM4-80GB
Seed 1 run dir outputs/egonormia_sft/20260228141559/
Seed 2 run dir outputs/egonormia_sft/20260301002022/
Uploaded checkpoint seed2 step_150

Evaluation (200 verified test samples)

Model Action Justification Both S-IoU
Zero-shot 58.5% 81.5% 51.0% 0.516
v3 best (step_175) 78.0% 97.0% 77.0% 0.664
v5 seed1 (step_155) 80.5% 95.5% 78.5% 0.618
v5 seed2 (step_150) 82.0% 95.5% 78.5% 0.634

Average over the two seed-wise best checkpoints:

  • Action: 81.25%
  • Justification: 95.5%
  • Both: 78.5%
  • S-IoU: 0.626

Robustness (option shuffle)

Checkpoint Delta Action Delta S-IoU Sign test p Verdict
seed1 step_155 -2.0pt -0.035 0.585 pass
seed2 step_150 -5.0pt -0.027 0.076 pass

Notes

  • v5 recovers the robustness lost in v4 while keeping stronger action accuracy than v3.
  • Best S-IoU still trails v3 (0.634 vs 0.664), so the gain is mainly in action / joint accuracy tradeoff rather than sensibility quality.
  • On this run family, explicit think-mode inference hurts performance: for seed2, no-think step_150 reaches 82.0% action / 78.5% both, while think mode peaks lower at 78.0% action / 72.5% both.

Usage

from transformers import AutoProcessor, Qwen3VLForConditionalGeneration

model = Qwen3VLForConditionalGeneration.from_pretrained(
    "robertzty/EgoNormia-Cosmos-Reason2-2B-v5-shortcot",
    torch_dtype="bfloat16",
    device_map="auto",
)
processor = AutoProcessor.from_pretrained("robertzty/EgoNormia-Cosmos-Reason2-2B-v5-shortcot")
Downloads last month
2
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for robertzty/EgoNormia-Cosmos-Reason2-2B-v5-shortcot

Finetuned
(9)
this model