PivotRL: High Accuracy Agentic Post-Training at Low Compute Cost Paper • 2603.21383 • Published 1 day ago • 6
Effective Strategies for Asynchronous Software Engineering Agents Paper • 2603.21489 • Published 1 day ago • 1
On the Direction of RLVR Updates for LLM Reasoning: Identification and Exploitation Paper • 2603.22117 • Published about 16 hours ago • 7
WorldCache: Content-Aware Caching for Accelerated Video World Models Paper • 2603.22286 • Published about 14 hours ago • 3
Beyond Single Tokens: Distilling Discrete Diffusion Models via Discrete MMD Paper • 2603.20155 • Published 4 days ago • 7
WorldAgents: Can Foundation Image Models be Agents for 3D World Models? Paper • 2603.19708 • Published 4 days ago • 10
A Subgoal-driven Framework for Improving Long-Horizon LLM Agents Paper • 2603.19685 • Published 4 days ago • 14
ProRL Agent: Rollout-as-a-Service for RL Training of Multi-Turn LLM Agents Paper • 2603.18815 • Published 5 days ago • 10
Reasoning over mathematical objects: on-policy reward modeling and test time aggregation Paper • 2603.18886 • Published 5 days ago • 3
Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation Paper • 2603.19220 • Published 5 days ago • 54
PRISM: Demystifying Retention and Interaction in Mid-Training Paper • 2603.17074 • Published 7 days ago • 1
LaDe: Unified Multi-Layered Graphic Media Generation and Decomposition Paper • 2603.17965 • Published 6 days ago • 5
Unified Spatio-Temporal Token Scoring for Efficient Video VLMs Paper • 2603.18004 • Published 6 days ago • 12