benchmark-evaluation allenai/ai2_arc Viewer • Updated Dec 21, 2023 • 7.79k • 457k • 337 Rowan/hellaswag Viewer • Updated Jul 10, 2025 • 60k • 309k • 175 ybisk/piqa Updated Jan 18, 2024 • 58.6k • 104 EleutherAI/lambada_openai Viewer • Updated Jul 10, 2025 • 30.9k • 97.8k • 49
benchmark-evaluation allenai/ai2_arc Viewer • Updated Dec 21, 2023 • 7.79k • 457k • 337 Rowan/hellaswag Viewer • Updated Jul 10, 2025 • 60k • 309k • 175 ybisk/piqa Updated Jan 18, 2024 • 58.6k • 104 EleutherAI/lambada_openai Viewer • Updated Jul 10, 2025 • 30.9k • 97.8k • 49