Agent0-VL: Exploring Self-Evolving Agent for Tool-Integrated Vision-Language Reasoning Paper • 2511.19900 • Published 13 days ago • 46
Mimicking the Physicist's Eye:A VLM-centric Approach for Physics Formula Discovery Paper • 2508.17380 • Published Aug 24 • 6
Position: The Hidden Costs and Measurement Gaps of Reinforcement Learning with Verifiable Rewards Paper • 2509.21882 • Published Sep 26
Agent0: Unleashing Self-Evolving Agents from Zero Data via Tool-Integrated Reasoning Paper • 2511.16043 • Published 18 days ago • 104
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent Paper • 2508.05748 • Published Aug 7 • 140
Agent KB: Leveraging Cross-Domain Experience for Agentic Problem Solving Paper • 2507.06229 • Published Jul 8 • 75
MMedAgent-RL: Optimizing Multi-Agent Collaboration for Multimodal Medical Reasoning Paper • 2506.00555 • Published May 31 • 1
PhysUniBench: An Undergraduate-Level Physics Reasoning Benchmark for Multimodal Models Paper • 2506.17667 • Published Jun 21 • 4
Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers Paper • 2506.23918 • Published Jun 30 • 89
MMedPO: Aligning Medical Vision-Language Models with Clinical-Aware Multimodal Preference Optimization Paper • 2412.06141 • Published Dec 9, 2024
MJ-VIDEO: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation Paper • 2502.01719 • Published Feb 3
MDocAgent: A Multi-Modal Multi-Agent Framework for Document Understanding Paper • 2503.13964 • Published Mar 18 • 20
MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models Paper • 2410.13085 • Published Oct 16, 2024 • 24
MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models Paper • 2410.10139 • Published Oct 14, 2024 • 51
MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models Paper • 2410.10139 • Published Oct 14, 2024 • 51