ITBench-AA: Frontier Models Score Below 50% on the First Benchmark for Agentic Enterprise IT Tasks — by Artificial Analysis and IBM
• 10
Enterprise AI and ML, Foundation Models, Responsible AI
Beyond Final Answers: Auditing Trajectory-Level Hallucinations in Multi-Agent Industrial Workflows
Agentic CLEAR: Automating Multi-Level Evaluation of LLM Agents