Add --hub-resume and --step-offset to allow resuming directly from HF Hub PEFT checkpoints 8bd8552 Running shank commited on 8 days ago
Revert speed optimizations to prioritize real-world model quality cf25957 shank commited on 9 days ago
fix: upgrade bitsandbytes>=0.49.0 (triton.ops), switch to Qwen2.5-Coder-3B a2fa47a shank commited on Apr 26
fix: torch at build time, remove mergekit (conflicts accelerate/peft/trl) 2bfaf77 shank commited on Apr 26
fix: remove wandb - click conflict with gradio causes resolution-too-deep 2005cd2 shank commited on Apr 26
chore: normalize dataset inputs and fix mergekit dependency for TRL 0.14.0 e67270e shank commited on Apr 25
Auto-detect GPU: bfloat16+batch2+gen8 on A100, float16+batch1+gen4 on T4 — same script works on both ea6fe4e shank commited on Apr 25
Reduce max_completion_length to 160 for T4 speed: target 1000 steps in <8hrs 9487853 shank commited on Apr 25
Optimize for Kaggle P100: float16, batch=1, grad_accum=8, num_gen=4, max_completion=256, lora_r=8 73f957d shank commited on Apr 25
Fix GRPOConfig: rename max_new_tokens to max_completion_length for trl==0.14.0 8b16369 shank commited on Apr 25
Stabilize Space runtime: pin ML deps and disable runtime package drift 663b8db shank commited on Apr 25
Pin torch to cu121 build + use model.device instead of hardcoded cuda string 8f291e0 shank commited on Apr 25
Replace unsloth with bitsandbytes+peft: fixes CUDA driver incompatibility on HF A100 c325ad7 shank commited on Apr 25
Reduce training to 500 steps with tightened curriculum for A10G budget ba8df98 shank commited on Apr 25
Optimize for A100 80GB: 8 generations, batch 4, lr 2e-5, dense logging 2b1fbf3 shank commited on Apr 25
Reduce training to 500 steps with tightened curriculum for A10G budget 3152fa9 shank commited on Apr 25