CoLaR Qwen3-4B Flawed Fictions RL

Compressed Latent Reasoning (CoLaR) model fine-tuned with reinforcement learning on the Flawed Fictions dataset.

Base model: Qwen/Qwen3-4B-Instruct-2507 WandB run: hqaakpve

Checkpoints

Tag Epoch Step val/reward
best-epoch24-val_reward=0.6719 24 14528 0.6719
second-epoch08-val_reward=0.6406 8 5184 0.6406
last-epoch28-val_reward=0.5781 28 16864 0.5781

Each checkpoint is stored as a tagged commit on main. Use:

from huggingface_hub import snapshot_download
snapshot_download("agurung/colar-qwen3-4b-ff-rl", revision="best-epoch24-val_reward=0.6719")

File Structure

  • model.safetensors — LLM weights (merged LoRA if applicable)
  • extra_state.pt — Latent policy network weights
  • export_meta.json — Export metadata
Downloads last month
35
Safetensors
Model size
4B params
Tensor type
BF16
·
Video Preview
loading

Model tree for agurung/colar-qwen3-4b-ff-rl

Finetuned
(1458)
this model