Grounding Computer Use Agents on Human Demonstrations Paper • 2511.07332 • Published Nov 10, 2025 • 107
view article Article Back to The Future: Evaluating AI Agents on Predicting Future Events +5 vinid, junlinw, zainhasan, shangzhu, coolcat21, clefourrier, jameszou • Jul 17, 2025 • 52
How to Train Your LLM Web Agent: A Statistical Diagnosis Paper • 2507.04103 • Published Jul 5, 2025 • 52
RepLiQA: A Question-Answering Dataset for Benchmarking LLMs on Unseen Reference Content Paper • 2406.11811 • Published Jun 17, 2024 • 16