Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation Paper • 2602.12125 • Published 5 days ago • 56
SWE-Master: Unleashing the Potential of Software Engineering Agents via Post-Training Paper • 2602.03411 • Published 14 days ago • 36
SWE-World: Building Software Engineering Agents in Docker-Free Environments Paper • 2602.03419 • Published 14 days ago • 39
CharacterBox: Evaluating the Role-Playing Capabilities of LLMs in Text-Based Virtual Worlds Paper • 2412.05631 • Published Dec 7, 2024 • 2
Harder Is Better: Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation Paper • 2601.20614 • Published 20 days ago • 118
Harder Is Better: Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation Paper • 2601.20614 • Published 20 days ago • 118 • 19
Harder Is Better: Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation Paper • 2601.20614 • Published 20 days ago • 118 • 19
CharacterBox: Evaluating the Role-Playing Capabilities of LLMs in Text-Based Virtual Worlds Paper • 2412.05631 • Published Dec 7, 2024 • 2