WebMMU: A Benchmark for Multimodal Multilingual Website Understanding and Code Generation Paper • 2508.16763 • Published Aug 22 • 2
BigCharts-R1: Enhanced Chart Reasoning with Visual Reinforcement Finetuning Paper • 2508.09804 • Published Aug 13
Scope: Selective Cross-modal Orchestration of Visual Perception Experts Paper • 2510.12974 • Published Oct 14
Grounding Computer Use Agents on Human Demonstrations Paper • 2511.07332 • Published 29 days ago • 104
ClimateGAN: Raising Climate Change Awareness by Generating Images of Floods Paper • 2110.02871 • Published Oct 6, 2021
MuPT: A Generative Symbolic Music Pretrained Transformer Paper • 2404.06393 • Published Apr 9, 2024 • 16
Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation Paper • 2211.06687 • Published Nov 12, 2022 • 4
BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks Paper • 2412.04626 • Published Dec 5, 2024 • 14
A Single Merging Suffices: Recovering Server-based Learning Performance in Decentralized Learning Paper • 2507.06542 • Published Jul 9
MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation Paper • 2406.07529 • Published Jun 11, 2024
Improving GUI Grounding with Explicit Position-to-Coordinate Mapping Paper • 2510.03230 • Published Oct 3 • 3
Chronological Thinking in Full-Duplex Spoken Dialogue Language Models Paper • 2510.05150 • Published Oct 2
Scope: Selective Cross-modal Orchestration of Visual Perception Experts Paper • 2510.12974 • Published Oct 14
Scaling Latent Reasoning via Looped Language Models Paper • 2510.25741 • Published Oct 29 • 219
Scope: Selective Cross-modal Orchestration of Visual Perception Experts Paper • 2510.12974 • Published Oct 14
VeritasFi: An Adaptable, Multi-tiered RAG Framework for Multi-modal Financial Question Answering Paper • 2510.10828 • Published Oct 12 • 1
Improving GUI Grounding with Explicit Position-to-Coordinate Mapping Paper • 2510.03230 • Published Oct 3 • 3
StarFlow: Generating Structured Workflow Outputs From Sketch Images Paper • 2503.21889 • Published Mar 27 • 2