Vintern-1B: An Efficient Multimodal Large Language Model for Vietnamese Paper • 2408.12480 • Published Aug 22, 2024 • 27
Grokfast: Accelerated Grokking by Amplifying Slow Gradients Paper • 2405.20233 • Published May 30, 2024 • 7
🧠Reasoning datasets Collection Datasets with reasoning traces for math and code released by the community • 24 items • Updated May 19, 2025 • 189
view article Article The 4 Things Qwen-3’s Chat Template Teaches Us cfahlgren1 • Apr 30, 2025 • 88
view article Article Model2Vec: Distill a Small Fast Model from any Sentence Transformer Pringled • Oct 14, 2024 • 104
Stronger Models are NOT Stronger Teachers for Instruction Tuning Paper • 2411.07133 • Published Nov 11, 2024 • 38
Transformer Explainer: Interactive Learning of Text-Generative Models Paper • 2408.04619 • Published Aug 8, 2024 • 175