rm-robustness (rm-robustness)

JW17

authored 2 papers 6 months ago

AlphaPO -- Reward shape matters for LLM alignment

Paper • 2501.03884 • Published Jan 7, 2025 • 2

Online Difficulty Filtering for Reasoning Oriented Reinforcement Learning

Paper • 2504.03380 • Published Apr 4, 2025

JW17

authored a paper 8 months ago

When AI Co-Scientists Fail: SPOT-a Benchmark for Automated Verification of Scientific Research

Paper • 2505.11855 • Published May 17, 2025 • 10

amphora

authored a paper 8 months ago

When AI Co-Scientists Fail: SPOT-a Benchmark for Automated Verification of Scientific Research

Paper • 2505.11855 • Published May 17, 2025 • 10

JW17

updated a collection 8 months ago

[ICML 2025] Robustness in RMs

Collection

Dataset and reward models for "On the Robustness of Reward Models for Language Model Alignment (ICML 2025)" • 8 items • Updated May 27, 2025

JW17

updated a model 8 months ago

rm-robustness/L31-8B-SKPv2-BSR-1e4

Text Classification • 8B • Updated May 11, 2025 • 6

JW17

published a model 8 months ago

rm-robustness/L31-8B-SKPv2-BSR-1e4

Text Classification • 8B • Updated May 11, 2025 • 6

JW17

updated a model 8 months ago

rm-robustness/L31-8B-SKPv2-BSR-1e3

Text Classification • 8B • Updated May 11, 2025 • 6

JW17

published a model 8 months ago

rm-robustness/L31-8B-SKPv2-BSR-1e3

Text Classification • 8B • Updated May 11, 2025 • 6

JW17

updated a model 8 months ago

rm-robustness/L31-8B-SKPv2-BSR-1e2

Text Classification • 8B • Updated May 11, 2025 • 7

JW17

published a model 8 months ago

rm-robustness/L31-8B-SKPv2-BSR-1e2

Text Classification • 8B • Updated May 11, 2025 • 7

JW17

updated a dataset 8 months ago

rm-robustness/ultrafeedback-valid-4-mutual-ood

Viewer • Updated May 11, 2025 • 11.1k • 8

JW17

published a dataset 8 months ago

rm-robustness/ultrafeedback-valid-4-mutual-ood

Viewer • Updated May 11, 2025 • 11.1k • 8

AI & ML interests

Team members 4

rm-robustness's activity