Qwen 3 4B RLM RLVR Collection LoRA adapters, full fine-tuned checkpoints, and SFT warmup models trained with RLVR in the recursive language model depth-1 harness. • 12 items • Updated 4 days ago
lsteno/Qwen3-4B-Instruct-2507-RLM-RLVR-FullFT-lr5e-6-depth1-v1 Text Generation • 4B • Updated 5 days ago • 71 •
Qwen 3 4B RLM RLVR Collection LoRA adapters, full fine-tuned checkpoints, and SFT warmup models trained with RLVR in the recursive language model depth-1 harness. • 12 items • Updated 4 days ago