CaMMT: Benchmarking Culturally Aware Multimodal Machine Translation Paper • 2505.24456 • Published May 30
A Case Against Implicit Standards: Homophone Normalization in Machine Translation for Languages that use the Ge'ez Script Paper • 2507.15142 • Published Jul 20
SpatialThinker: Reinforcing 3D Reasoning in Multimodal LLMs via Spatial Rewards Paper • 2511.07403 • Published 26 days ago • 13
RewardMap: Tackling Sparse Rewards in Fine-grained Visual Reasoning via Multi-Stage Reinforcement Learning Paper • 2510.02240 • Published Oct 2 • 17
Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence Paper • 2510.20579 • Published Oct 23 • 55
VideoLucy: Deep Memory Backtracking for Long Video Understanding Paper • 2510.12422 • Published Oct 14 • 1
BrowserArena: Evaluating LLM Agents on Real-World Web Navigation Tasks Paper • 2510.02418 • Published Oct 2 • 2
BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution Paper • 2510.08697 • Published Oct 9 • 35
Distributional Semantics Tracing: A Framework for Explaining Hallucinations in Large Language Models Paper • 2510.06107 • Published Oct 7 • 2
See, Point, Fly: A Learning-Free VLM Framework for Universal Unmanned Aerial Navigation Paper • 2509.22653 • Published Sep 26 • 23
EmbeddingGemma: Powerful and Lightweight Text Representations Paper • 2509.20354 • Published Sep 24 • 41
MERIT: Multilingual Semantic Retrieval with Interleaved Multi-Condition Query Paper • 2506.03144 • Published Jun 3 • 7
Talk2Event: Grounded Understanding of Dynamic Scenes from Event Cameras Paper • 2507.17664 • Published Jul 23 • 1
aiXiv: A Next-Generation Open Access Ecosystem for Scientific Discovery Generated by AI Scientists Paper • 2508.15126 • Published Aug 20 • 20
Grounding Multilingual Multimodal LLMs With Cultural Knowledge Paper • 2508.07414 • Published Aug 10 • 1