SurgWorld: Learning Surgical Robot Policies from Videos via World Modeling Paper • 2512.23162 • Published 3 days ago • 8
MobileWorldBench: Towards Semantic World Modeling For Mobile Agents Paper • 2512.14014 • Published 16 days ago • 2
Game-TARS: Pretrained Foundation Models for Scalable Generalist Multimodal Game Agents Paper • 2510.23691 • Published Oct 27, 2025 • 53
MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents Paper • 2507.19478 • Published Jul 25, 2025 • 31
Technical Report of TeleChat2, TeleChat2.5 and T1 Paper • 2507.18013 • Published Jul 24, 2025 • 10
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities Paper • 2507.06261 • Published Jul 7, 2025 • 64
Do Vision-Language Models Have Internal World Models? Towards an Atomic Evaluation Paper • 2506.21876 • Published Jun 27, 2025 • 28
InfiGUI-R1: Advancing Multimodal GUI Agents from Reactive Actors to Deliberative Reasoners Paper • 2504.14239 • Published Apr 19, 2025 • 14
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models Paper • 2504.11468 • Published Apr 10, 2025 • 30
Agent S2: A Compositional Generalist-Specialist Framework for Computer Use Agents Paper • 2504.00906 • Published Apr 1, 2025 • 27
UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning Paper • 2503.21620 • Published Mar 27, 2025 • 62