RiddleBench: A New Generative Reasoning Benchmark for LLMs Paper • 2510.24932 • Published Oct 28, 2025 • 8
AyurParam: A State-of-the-Art Bilingual Language Model for Ayurveda Paper • 2511.02374 • Published Nov 4, 2025 • 4
LTD-Bench: Evaluating Large Language Models by Letting Them Draw Paper • 2511.02347 • Published Nov 4, 2025 • 9
Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning Paper • 2510.25992 • Published Oct 29, 2025 • 48
TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling Paper • 2508.17445 • Published Aug 24, 2025 • 80
WebSailor: Navigating Super-human Reasoning for Web Agent Paper • 2507.02592 • Published Jul 3, 2025 • 124
BlenderFusion: 3D-Grounded Visual Editing and Generative Compositing Paper • 2506.17450 • Published Jun 20, 2025 • 64
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models Paper • 2505.24864 • Published May 30, 2025 • 143
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention Paper • 2506.13585 • Published Jun 16, 2025 • 273
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning Paper • 2506.01939 • Published Jun 2, 2025 • 188