This is a leaderboard for magebench
Preference Proxy Evaluations
Filter prompts based on similarity and language
Browse and view model judgments in benchmarks
Display text leaderboard
Track GitHub team management statistics for SWE assistants
Track GitHub wiki statistics for SWE assistants
Track GitHub releases statistics for SWE assistants
Submit model predictions and view leaderboard
Multi-run AutoBench leaderboard with historical navigation
Ranking of LLMs for agentic tasks
Display MTEB Arena interface
Track GitHub PR statistics for SWE assistants
Track GitHub review statistics for SWE assistants