Reasoning Models
updated
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Paper
• 2501.18585
• Published
• 61
LLMs Can Easily Learn to Reason from Demonstrations Structure, not
content, is what matters!
Paper
• 2502.07374
• Published
• 40
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time
Scaling
Paper
• 2502.06703
• Published
• 152
S*: Test Time Scaling for Code Generation
Paper
• 2502.14382
• Published
• 63
START: Self-taught Reasoner with Tools
Paper
• 2503.04625
• Published
• 113
R1-Omni: Explainable Omni-Multimodal Emotion Recognition with
Reinforcing Learning
Paper
• 2503.05379
• Published
• 38
R1-Zero's "Aha Moment" in Visual Reasoning on a 2B Non-SFT Model
Paper
• 2503.05132
• Published
• 57
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive
Cognitive-Inspired Sketching
Paper
• 2503.05179
• Published
• 46
MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale
Reinforcement Learning
Paper
• 2503.07365
• Published
• 61
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through
Two-Stage Rule-Based RL
Paper
• 2503.07536
• Published
• 88
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
Paper
• 2503.12605
• Published
• 35
R1-VL: Learning to Reason with Multimodal Large Language Models via
Step-wise Group Relative Policy Optimization
Paper
• 2503.12937
• Published
• 30
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper
• 2503.14476
• Published
• 144
DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs
for Knowledge-Intensive Visual Grounding
Paper
• 2503.12797
• Published
• 32
Unlocking Efficient Long-to-Short LLM Reasoning with Model Merging
Paper
• 2503.20641
• Published
• 10
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement
Learning on the Base Model
Paper
• 2503.24290
• Published
• 62
Effectively Controlling Reasoning Models through Thinking Intervention
Paper
• 2503.24370
• Published
• 19
A Survey of Efficient Reasoning for Large Reasoning Models: Language,
Multimodality, and Beyond
Paper
• 2503.21614
• Published
• 43
Exploring the Effect of Reinforcement Learning on Video Understanding:
Insights from SEED-Bench-R1
Paper
• 2503.24376
• Published
• 38
Inference-Time Scaling for Complex Tasks: Where We Stand and What Lies
Ahead
Paper
• 2504.00294
• Published
• 10
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective
Reinforcement Learning for LLM Reasoning
Paper
• 2506.01939
• Published
• 188
Learning What Reinforcement Learning Can't: Interleaved Online
Fine-Tuning for Hardest Questions
Paper
• 2506.07527
• Published
• 3
The Illusion of Thinking: Understanding the Strengths and Limitations of
Reasoning Models via the Lens of Problem Complexity
Paper
• 2506.06941
• Published
• 16
Reinforcement Pre-Training
Paper
• 2506.08007
• Published
• 263
Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction
Paper
• 2506.07976
• Published
• 6
ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought
Reasoning in LLMs
Paper
• 2506.18896
• Published
• 29
Kwai Keye-VL Technical Report
Paper
• 2507.01949
• Published
• 131
RLPR: Extrapolating RLVR to General Domains without Verifiers
Paper
• 2506.18254
• Published
• 32
Perception-Aware Policy Optimization for Multimodal Reasoning
Paper
• 2507.06448
• Published
• 48
Test-Time Scaling with Reflective Generative Model
Paper
• 2507.01951
• Published
• 108
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality,
Long Context, and Next Generation Agentic Capabilities
Paper
• 2507.06261
• Published
• 67
Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for
Visual Reasoning
Paper
• 2507.05255
• Published
• 75
Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning
Systems in LLMs
Paper
• 2507.09477
• Published
• 88
The Invisible Leash: Why RLVR May Not Escape Its Origin
Paper
• 2507.14843
• Published
• 85
Group Sequence Policy Optimization
Paper
• 2507.18071
• Published
• 317
LAPO: Internalizing Reasoning Efficiency via Length-Adaptive Policy
Optimization
Paper
• 2507.15758
• Published
• 35
THU-KEG/LongWriter-Zero-32B
Text Generation
• 33B • Updated
• 19
• • 111
MUR: Momentum Uncertainty guided Reasoning for Large Language Models
Paper
• 2507.14958
• Published
• 47
Agentic Reinforced Policy Optimization
Paper
• 2507.19849
• Published
• 158
Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning
Paper
• 2508.08221
• Published
• 50
We-Math 2.0: A Versatile MathBook System for Incentivizing Visual
Mathematical Reasoning
Paper
• 2508.10433
• Published
• 144
GTAlign: Game-Theoretic Alignment of LLM Assistants for Mutual Welfare
Paper
• 2510.08872
• Published
• 4
RL makes MLLMs see better than SFT
Paper
• 2510.16333
• Published
• 49
Scaling Latent Reasoning via Looped Language Models
Paper
• 2510.25741
• Published
• 229
Supervised Reinforcement Learning: From Expert Trajectories to Step-wise
Reasoning
Paper
• 2510.25992
• Published
• 48
OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe
Paper
• 2511.16334
• Published
• 93
Fantastic Reasoning Behaviors and Where to Find Them: Unsupervised Discovery of the Reasoning Process
Paper
• 2512.23988
• Published
• 18
Thinking with Map: Reinforced Parallel Map-Augmented Agent for Geolocalization
Paper
• 2601.05432
• Published
• 167
PixelGen: Pixel Diffusion Beats Latent Diffusion with Perceptual Loss
Paper
• 2602.02493
• Published
• 42
RLAnything: Forge Environment, Policy, and Reward Model in Completely Dynamic RL System
Paper
• 2602.02488
• Published
• 32
Code2World: A GUI World Model via Renderable Code Generation
Paper
• 2602.09856
• Published
• 193