Agentic Policy Optimization via Instruction-Policy Co-Evolution Paper • 2512.01945 • Published 8 days ago • 3
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities Paper • 2507.06261 • Published Jul 7 • 64
Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators Paper • 2403.16950 • Published Mar 25, 2024 • 4
TopViewRS: Vision-Language Models as Top-View Spatial Reasoners Paper • 2406.02537 • Published Jun 4, 2024
Fairer Preferences Elicit Improved Human-Aligned Large Language Model Judgments Paper • 2406.11370 • Published Jun 17, 2024
From Few to Many: Self-Improving Many-Shot Reasoners Through Iterative Optimization and Generation Paper • 2502.00330 • Published Feb 1
Multi-Agent Design: Optimizing Agents with Better Prompts and Topologies Paper • 2502.02533 • Published Feb 4 • 4
This Is Your Doge, If It Please You: Exploring Deception and Robustness in Mixture of LLMs Paper • 2503.05856 • Published Mar 7 • 7
Judging the Judges: A Collection of LLM-Generated Relevance Judgements Paper • 2502.13908 • Published Feb 19 • 5
HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models Paper • 2409.16191 • Published Sep 24, 2024 • 42
AutoPEFT: Automatic Configuration Search for Parameter-Efficient Fine-Tuning Paper • 2301.12132 • Published Jan 28, 2023 • 1
Batch Calibration: Rethinking Calibration for In-Context Learning and Prompt Engineering Paper • 2309.17249 • Published Sep 29, 2023
Survival of the Most Influential Prompts: Efficient Black-Box Prompt Search via Clustering and Pruning Paper • 2310.12774 • Published Oct 19, 2023
Multi3WOZ: A Multilingual, Multi-Domain, Multi-Parallel Dataset for Training and Evaluating Culturally Adapted Task-Oriented Dialog Systems Paper • 2307.14031 • Published Jul 26, 2023
XQA-DST: Multi-Domain and Multi-Lingual Dialogue State Tracking Paper • 2204.05895 • Published Apr 12, 2022
A Systematic Study of Performance Disparities in Multilingual Task-Oriented Dialogue Systems Paper • 2310.12892 • Published Oct 19, 2023
On Task Performance and Model Calibration with Supervised and Self-Ensembled In-Context Learning Paper • 2312.13772 • Published Dec 21, 2023