view article Article We Got Claude to Build CUDA Kernels and teach open models! +2 burtenshaw, evalstate, merve, pcuenq • Jan 28 • 156
view article Article The Transformers Library: standardizing model definitions +2 lysandre, ArthurZ, pcuenq, julien-c • May 15, 2025 • 122
Enhancing Training Efficiency Using Packing with Flash Attention Paper • 2407.09105 • Published Jul 12, 2024 • 17
view article Article Improving Hugging Face Training Efficiency Through Packing with Flash Attention 2 +4 RQlee, ArthurZ, achikundu, lwtr, rganti, mayank-mishra • Aug 21, 2024 • 41
LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs Paper • 2408.07055 • Published Aug 13, 2024 • 69
view article Article Welcome Falcon Mamba: The first strong attention-free 7B model +4 JingweiZuo, yellowvm, DhiyaEddine, IChahed, ybelkada, Gkunsch • Aug 12, 2024 • 113