Reinforcement learning - a Vansh2676 Collection

Vansh2676 's Collections

Reinforcement learning

Reinforcement learning

updated 1 day ago

Flash-GRPO: Efficient Alignment for Video Diffusion via One-Step Policy Optimization

Paper • 2605.15980 • Published 7 days ago • 34
NGRPO: Negative-enhanced Group Relative Policy Optimization

Paper • 2509.18851 • Published Sep 23, 2025 • 2
CEPO: RLVR Self-Distillation using Contrastive Evidence Policy Optimization

Paper • 2605.19436 • Published 3 days ago • 13
Delta Attention Residuals

Paper • 2605.18855 • Published 9 days ago • 6
Steered LLM Activations are Non-Surjective

Paper • 2604.09839 • Published 15 days ago • 10
GQLA: Group-Query Latent Attention for Hardware-Adaptive Large Language Model Decoding

Paper • 2605.15250 • Published 8 days ago • 3
Self-Distilled Agentic Reinforcement Learning

Paper • 2605.15155 • Published 8 days ago • 106
AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation

Paper • 2605.13724 • Published 9 days ago • 96
F-GRPO: Factorized Group-Relative Policy Optimization for Unified Candidate Generation and Ranking

Paper • 2605.12995 • Published 9 days ago • 2
World Action Models: The Next Frontier in Embodied AI

Paper • 2605.12090 • Published 10 days ago • 64
AlphaGRPO: Unlocking Self-Reflective Multimodal Generation in UMMs via Decompositional Verifiable Reward

Paper • 2605.12495 • Published 10 days ago • 35
World Model for Robot Learning: A Comprehensive Survey

Paper • 2605.00080 • Published 22 days ago • 16
Flow-OPD: On-Policy Distillation for Flow Matching Models

Paper • 2605.08063 • Published 14 days ago • 97
Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation

Paper • 2602.12125 • Published Feb 12 • 66
KEPO: Knowledge-Enhanced Preference Optimization for Reinforcement Learning with Reasoning

Paper • 2602.00400 • Published Jan 30
OVD: On-policy Verbal Distillation

Paper • 2601.21968 • Published Jan 29 • 5
SODA: Semi On-Policy Black-Box Distillation for Large Language Models

Paper • 2604.03873 • Published 29 days ago • 2
Post-Trained MoE Can Skip Half Experts via Self-Distillation

Paper • 2605.18643 • Published 4 days ago • 29