LoRA adapters, full fine-tuned checkpoints, and SFT warmup models trained with RLVR in the recursive language model depth-1 harness.
-
lsteno/Qwen3-4B-Instruct-2507-RLM-SFT-v3-per-root-turn
4B • Updated • 65 -
lsteno/Qwen3-4B-Instruct-2507-RLM-RL-depth1-r4-a8-lr5e-7-s150-lora
Updated • 6 -
lsteno/Qwen3-4B-Instruct-2507-RLM-RL-depth1-r4-a8-lr1e-5-s150-lora
Updated • 3 -
lsteno/qwen3-rlm-depth1-r4-a8-lr1e-4-s150-bal35f40v1-lora
Updated • 3