Nazzaroth2 's Collections RL_Papers in general
updated
Genius: A Generalizable and Purely Unsupervised Self-Training Framework
For Advanced Reasoning
Paper
• 2504.08672
• Published
• 55
A Strategic Coordination Framework of Small LLMs Matches Large LLMs in
Data Synthesis
Paper
• 2504.12322
• Published
• 28
Learning to Reason under Off-Policy Guidance
Paper
• 2504.14945
• Published
• 88
TTRL: Test-Time Reinforcement Learning
Paper
• 2504.16084
• Published
• 120
Absolute Zero: Reinforced Self-play Reasoning with Zero Data
Paper
• 2505.03335
• Published
• 189
Reasoning Models Better Express Their Confidence
Paper
• 2505.14489
• Published
• 20
VeriThinker: Learning to Verify Makes Reasoning Model Efficient
Paper
• 2505.17941
• Published
• 25
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in
Large Language Models
Paper
• 2505.24864
• Published
• 143
Reinforcement Pre-Training
Paper
• 2506.08007
• Published
• 263
GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal
Reasoning
Paper
• 2506.16141
• Published
• 27