TokenRouter: Efficient Serving System for Token-Level LLM Routing
AI & ML interests
None defined yet.
Recent Activity
Papers
SALAD: Achieve High-Sparsity Attention via Efficient Linear Attention Tuning for Video Diffusion Transformer
Think-at-Hard: Selective Latent Iterations to Improve Reasoning Language Models
MARSHAL: Incentivizing Multi-Agent Reasoning via Self-Play with Strategic LLMs
🎉 Accepted by ICLR 2026
-
MARS: Reinforcing Multi-Agent Reasoning of LLMs through Self-Play in Strategic Games
Paper • 2510.15414 • Published • 1 -
nics-efc/MARSHAL-Generalist-Qwen3-4B
Text Generation • 4B • Updated • 12 -
nics-efc/MARSHAL-Generalist-Qwen3-8B
Text Generation • 8B • Updated • 12 -
nics-efc/MARSHAL-Tic-Tac-Toe-Qwen3-4B
Text Generation • 4B • Updated • 22
Artifacts of paper "Cache-to-Cache: Direct Semantic Communication Between Large Language Models"
-
R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routing
Paper • 2505.21600 • Published • 71 -
Distilled Decoding 1: One-step Sampling of Image Auto-regressive Models with Flow Matching
Paper • 2412.17153 • Published • 39 -
Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding
Paper • 2307.15337 • Published • 39 -
DiTFastAttn: Attention Compression for Diffusion Transformer Models
Paper • 2406.08552 • Published • 25
TokenRouter: Efficient Serving System for Token-Level LLM Routing
Verifiable Process Rewards for Agentic Reasoning
MARSHAL: Incentivizing Multi-Agent Reasoning via Self-Play with Strategic LLMs
🎉 Accepted by ICLR 2026
-
MARS: Reinforcing Multi-Agent Reasoning of LLMs through Self-Play in Strategic Games
Paper • 2510.15414 • Published • 1 -
nics-efc/MARSHAL-Generalist-Qwen3-4B
Text Generation • 4B • Updated • 12 -
nics-efc/MARSHAL-Generalist-Qwen3-8B
Text Generation • 8B • Updated • 12 -
nics-efc/MARSHAL-Tic-Tac-Toe-Qwen3-4B
Text Generation • 4B • Updated • 22
Think-at-Hard: Selective Latent Iterations to Improve Reasoning Language Models
Artifacts of paper "Cache-to-Cache: Direct Semantic Communication Between Large Language Models"
Collections for paper "R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routing"
-
R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routing
Paper • 2505.21600 • Published • 71 -
Distilled Decoding 1: One-step Sampling of Image Auto-regressive Models with Flow Matching
Paper • 2412.17153 • Published • 39 -
Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding
Paper • 2307.15337 • Published • 39 -
DiTFastAttn: Attention Compression for Diffusion Transformer Models
Paper • 2406.08552 • Published • 25