Harnessing Uncertainty: Entropy-Modulated Policy Gradients for Long-Horizon LLM Agents Paper • 2509.09265 • Published Sep 11 • 46
Skywork-Reward-V2: Scaling Preference Data Curation via Human-AI Synergy Paper • 2507.01352 • Published Jul 2 • 56
Skywork-Reward-V2 Collection Scaling preference data curation to the extreme • 9 items • Updated Jul 4 • 24
Skywork-SWE: Unveiling Data Scaling Laws for Software Engineering in LLMs Paper • 2506.19290 • Published Jun 24 • 52
Reward Bench 2 Collection Datasets, spaces, and models for Reward Bench 2 benchmark and paper! • 11 items • Updated 8 days ago • 16
Skywork-Reward: Bag of Tricks for Reward Modeling in LLMs Paper • 2410.18451 • Published Oct 24, 2024 • 20
Large Language Model Unlearning via Embedding-Corrupted Prompts Paper • 2406.07933 • Published Jun 12, 2024 • 9
Rethinking Machine Unlearning for Large Language Models Paper • 2402.08787 • Published Feb 13, 2024 • 3