SPICE: Self-Play In Corpus Environments Improves Reasoning Paper • 2510.24684 • Published Oct 28 • 15
MedScore: Generalizable Factuality Evaluation of Free-Form Medical Answers by Domain-adapted Claim Decomposition and Verification Paper • 2505.18452 • Published May 24 • 4
RIPPLECOT: Amplifying Ripple Effect of Knowledge Editing in Language Models via Chain-of-Thought In-Context Learning Paper • 2410.03122 • Published Oct 4, 2024 • 2
mmBERT: A Modern Multilingual Encoder with Annealed Language Learning Paper • 2509.06888 • Published Sep 8 • 12
The Era of Real-World Human Interaction: RL from User Conversations Paper • 2509.25137 • Published Sep 29 • 18
SoMi-ToM: Evaluating Multi-Perspective Theory of Mind in Embodied Social Interactions Paper • 2506.23046 • Published Jun 29 • 1