Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models Paper • 2501.09686 • Published Jan 16 • 41
Optimizing Large Language Model Training Using FP4 Quantization Paper • 2501.17116 • Published Jan 28 • 37
Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search Paper • 2502.02508 • Published Feb 4 • 23
Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning Paper • 2502.03275 • Published Feb 5 • 18
ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference Optimization Paper • 2502.04306 • Published Feb 6 • 20
BOLT: Bootstrap Long Chain-of-Thought in Language Models without Distillation Paper • 2502.03860 • Published Feb 6 • 25