DelTA: Discriminative Token Credit Assignment for Reinforcement Learning from Verifiable Rewards Paper • 2605.21467 • Published 7 days ago • 201
Perception or Prejudice: Can MLLMs Go Beyond First Impressions of Personality? Paper • 2605.22109 • Published 6 days ago • 165
MOCHA: Multi-Objective Chebyshev Annealing for Agent Skill Optimization Paper • 2605.19330 • Published 8 days ago • 8
Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information Paper • 2605.11609 • Published 15 days ago • 191
Learning to Foresee: Unveiling the Unlocking Efficiency of On-Policy Distillation Paper • 2605.11739 • Published 14 days ago • 58
AutoLLMResearch: Training Research Agents for Automating LLM Experiment Configuration -- Learning from Cheap, Optimizing Expensive Paper • 2605.11518 • Published 15 days ago • 4
MAP: A Map-then-Act Paradigm for Long-Horizon Interactive Agent Reasoning Paper • 2605.13037 • Published 14 days ago • 8
Chain of Evidence: Pixel-Level Visual Attribution for Iterative Retrieval-Augmented Generation Paper • 2605.01284 • Published 25 days ago • 3
Accelerating RL Post-Training Rollouts via System-Integrated Speculative Decoding Paper • 2604.26779 • Published 28 days ago • 13
LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model Paper • 2604.20796 • Published Apr 22 • 242
Improving Semantic Proximity in Information Retrieval through Cross-Lingual Alignment Paper • 2604.05684 • Published Apr 7 • 9
GrandCode: Achieving Grandmaster Level in Competitive Programming via Agentic Reinforcement Learning Paper • 2604.02721 • Published Apr 3 • 630
CARLA-Air: Fly Drones Inside a CARLA World -- A Unified Infrastructure for Air-Ground Embodied Intelligence Paper • 2603.28032 • Published Mar 30 • 342
FIPO: Eliciting Deep Reasoning with Future-KL Influenced Policy Optimization Paper • 2603.19835 • Published Mar 20 • 351