CoLaR Qwen3-4B Flawed Fictions RL
Compressed Latent Reasoning (CoLaR) model fine-tuned with reinforcement learning on the Flawed Fictions dataset.
Base model: Qwen/Qwen3-4B-Instruct-2507
WandB run: hqaakpve
Checkpoints
| Tag | Epoch | Step | val/reward |
|---|---|---|---|
best-epoch24-val_reward=0.6719 |
24 | 14528 | 0.6719 |
second-epoch08-val_reward=0.6406 |
8 | 5184 | 0.6406 |
last-epoch28-val_reward=0.5781 |
28 | 16864 | 0.5781 |
Each checkpoint is stored as a tagged commit on main. Use:
from huggingface_hub import snapshot_download
snapshot_download("agurung/colar-qwen3-4b-ff-rl", revision="best-epoch24-val_reward=0.6719")
File Structure
model.safetensors— LLM weights (merged LoRA if applicable)extra_state.pt— Latent policy network weightsexport_meta.json— Export metadata
- Downloads last month
- 35
Model tree for agurung/colar-qwen3-4b-ff-rl
Base model
Qwen/Qwen3-4B-Instruct-2507