A collection of models and datasets from the paper "Video-Based Reward Modeling for Computer-Use Agents". Github: https://github.com/limenlp/ExeVRM
AI & ML interests
Natural Language Processing
Recent Activity
View all activity
A collection of models and dataset from the paper "The Hallucination Tax of Reinforcement Finetuning".
Papers from LIME Lab
-
Safer-Instruct: Aligning Language Models with Automated Preference Data
Paper • 2311.08685 • Published • 1 -
CLIMB: A Benchmark of Clinical Bias in Large Language Models
Paper • 2407.05250 • Published • 2 -
WildFeedback: Aligning LLMs With In-situ User Interactions And Feedback
Paper • 2408.15549 • Published • 2 -
Detecting and Filtering Unsafe Training Data via Data Attribution
Paper • 2502.11411 • Published • 1
A collection of models and datasets from the paper "Video-Based Reward Modeling for Computer-Use Agents". Github: https://github.com/limenlp/ExeVRM
A collection of models and dataset from the paper "The Hallucination Tax of Reinforcement Finetuning".
We perform difficulty estimation on popular math datasets.
Papers from LIME Lab
-
Safer-Instruct: Aligning Language Models with Automated Preference Data
Paper • 2311.08685 • Published • 1 -
CLIMB: A Benchmark of Clinical Bias in Large Language Models
Paper • 2407.05250 • Published • 2 -
WildFeedback: Aligning LLMs With In-situ User Interactions And Feedback
Paper • 2408.15549 • Published • 2 -
Detecting and Filtering Unsafe Training Data via Data Attribution
Paper • 2502.11411 • Published • 1