Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation Paper • 2603.19220 • Published Mar 19 • 69
Not Every Rubric Teaches Equally: Policy-Aware Rubric Rewards for RLVR Paper • 2605.20164 • Published 26 days ago • 6
GoLongRL: Capability-Oriented Long Context Reinforcement Learning with Multitask Alignment Paper • 2605.19577 • Published 26 days ago • 58
EnvFactory: Scaling Tool-Use Agents via Executable Environments Synthesis and Robust RL Paper • 2605.18703 • Published 27 days ago • 50
Mid-Training with Self-Generated Data Improves Reinforcement Learning in Language Models Paper • 2605.08472 • Published May 8 • 5
Learning to Build the Environment: Self-Evolving Reasoning RL via Verifiable Environment Synthesis Paper • 2605.14392 • Published about 1 month ago • 8
RubricEM: Meta-RL with Rubric-guided Policy Decomposition beyond Verifiable Rewards Paper • 2605.10899 • Published May 11 • 79
Towards On-Policy Data Evolution for Visual-Native Multimodal Deep Search Agents Paper • 2605.10832 • Published May 11 • 22
Missing Old Logits in Asynchronous Agentic RL: Semantic Mismatch and Repair Methods for Off-Policy Correction Paper • 2605.12070 • Published May 12 • 16
AutoLLMResearch: Training Research Agents for Automating LLM Experiment Configuration -- Learning from Cheap, Optimizing Expensive Paper • 2605.11518 • Published May 12 • 4
DeltaRubric: Generative Multimodal Reward Modeling via Joint Planning and Verification Paper • 2605.09269 • Published May 10 • 6
DeepRefine: Agent-Compiled Knowledge Refinement via Reinforcement Learning Paper • 2605.10488 • Published May 11 • 3
OpenSeeker-v2: Pushing the Limits of Search Agents with Informative and High-Difficulty Trajectories Paper • 2605.04036 • Published May 5 • 69
QUEST: Training Frontier Deep Research Agents with Fully Synthetic Tasks Paper • 2605.24218 • Published 23 days ago • 42
Claw-Anything: Benchmarking Always-On Personal Assistants with Broader Access to User's Digital World Paper • 2605.26086 • Published 20 days ago • 24
RUBRIC-ARROW: Alternating Pointwise Rubric Reward Modeling for LLM Post-training in Non-verifiable Domains Paper • 2605.29156 • Published 18 days ago • 14
Verifiable Rewards Beyond Math and Code: Lightweight Corpus-Grounded Process Supervision for Factual Question Answering Paper • 2605.29648 • Published 17 days ago • 10
LongTraceRL: Learning Long-Context Reasoning from Search Agent Trajectories with Rubric Rewards Paper • 2605.31584 • Published 16 days ago • 41
GrepSeek: Training Search Agents for Direct Corpus Interaction Paper • 2605.29307 • Published 17 days ago • 106
SAAS: Self-Aware Reinforcement Learning for Over-Search Mitigation in Agentic Search Paper • 2605.29796 • Published 17 days ago • 25
OpenWebRL: Demystifying Online Multi-turn Reinforcement Learning for Visual Web Agents Paper • 2606.02031 • Published 13 days ago • 20
When Does Multi-Agent RL Improve LLM Workflows? Workflow, Scale, and Policy-Sharing Tradeoffs Paper • 2605.24202 • Published 23 days ago • 17
PaddleOCR-VL-1.6: Expanding the Frontier of Document Parsing with Under-Optimized Region Refinement and Progressive Post-Training Paper • 2606.03264 • Published 12 days ago • 16
Reproducing, Analyzing, and Detecting Reward Hacking in Rubric-Based Reinforcement Learning Paper • 2606.04923 • Published 11 days ago • 37
Reinforcement Learning from Rich Feedback with Distributional DAgger Paper • 2606.05152 • Published 11 days ago • 3
Compress-Distill: Reasoning Trace Compression for Efficient Knowledge Distillation Paper • 2606.05988 • Published 10 days ago • 2
Critic-R: Improving Agentic Search using Instruction-tuned Retrievers with Natural Language Introspective Feedback Paper • 2606.00590 • Published 15 days ago
SearchSwarm: Towards Delegation Intelligence in Agentic LLMs for Long-Horizon Deep Research Paper • 2606.09730 • Published 6 days ago • 49
FORT-Searcher: Synthesizing Shortcut-Resistant Search Tasks for Training Deep Search Agents Paper • 2606.12087 • Published 4 days ago • 71
EvoBrowseComp: Benchmarking Search Agents on Evolving Knowledge Paper • 2606.13120 • Published 3 days ago • 4