CelesteChen 's Collections reasoning
updated
Large Language Models Can Self-Improve in Long-context Reasoning
Paper
• 2411.08147
• Published
• 65
Reverse Thinking Makes LLMs Stronger Reasoners
Paper
• 2411.19865
• Published
• 23
Training Large Language Models to Reason in a Continuous Latent Space
Paper
• 2412.06769
• Published
• 94
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
Paper
• 2412.18925
• Published
• 107
ChemAgent: Self-updating Library in Large Language Models Improves
Chemical Reasoning
Paper
• 2501.06590
• Published
• 11
O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning
Paper
• 2501.12570
• Published
• 28
Pairwise RM: Perform Best-of-N Sampling with Knockout Tournament
Paper
• 2501.13007
• Published
• 19
Agent-R: Training Language Model Agents to Reflect via Iterative
Self-Training
Paper
• 2501.11425
• Published
• 109
Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary
Feedback
Paper
• 2501.10799
• Published
• 15
Process Reinforcement through Implicit Rewards
Paper
• 2502.01456
• Published
• 62
One Example Shown, Many Concepts Known! Counterexample-Driven Conceptual
Reasoning in Mathematical LLMs
Paper
• 2502.10454
• Published
• 7
Large Language Models and Mathematical Reasoning Failures
Paper
• 2502.11574
• Published
• 3
PhysReason: A Comprehensive Benchmark towards Physics-Based Reasoning
Paper
• 2502.12054
• Published
• 7
LightThinker: Thinking Step-by-Step Compression
Paper
• 2502.15589
• Published
• 31
OpenCodeReasoning: Advancing Data Distillation for Competitive Coding
Paper
• 2504.01943
• Published
• 15
MolmoAct: Action Reasoning Models that can Reason in Space
Paper
• 2508.07917
• Published
• 44
StepWiser: Stepwise Generative Judges for Wiser Reasoning
Paper
• 2508.19229
• Published
• 20
Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains
RLVR
Paper
• 2508.14029
• Published
• 118
Does Your Reasoning Model Implicitly Know When to Stop Thinking?
Paper
• 2602.08354
• Published
• 259