JAVEDIT: Joint Audio-Visual Instruction-Guided Video Editing with Agentic Data Curation Paper • 2606.03168 • Published about 1 month ago • 47
Eliciting Complex Spatial Reasoning in MLLMs through Wide-Baseline Matching Paper • 2606.03577 • Published about 1 month ago • 16
Where to Look: Can Foundation Models Reach a Target Viewpoint Through Active Exploration? Paper • 2606.01247 • Published May 31 • 31
Exploring Spatial Intelligence from a Generative Perspective Paper • 2604.20570 • Published Apr 22 • 23
OmniJigsaw: Enhancing Omni-Modal Reasoning via Modality-Orchestrated Reordering Paper • 2604.08209 • Published Apr 9 • 27