AI Struggles with Basic IT Tasks in New IBM Benchmark

A new benchmark shows that even advanced AI models score below 50% on simple IT tasks. This highlights how far AI has to go in understanding real-world enterprise needs. The benchmark is open-source, so anyone can test their own models.

IBM and Artificial Analysis released ITBench-AA, the first benchmark for testing AI models on real-world IT tasks. The benchmark evaluates how well AI can handle simple IT operations like troubleshooting, system configuration, and basic network management. Surprisingly, even the most advanced AI models scored below 50%, showing that AI still struggles with basic IT tasks.

This matters because it shows that AI isn't yet ready to fully replace human IT professionals. While AI can assist with complex tasks, it still lacks the nuanced understanding required for everyday IT work. This could impact businesses relying on AI for IT support, as the technology isn't as reliable as they might hope.

If you're curious about how AI performs on IT tasks, you can test your own models using the open-source ITBench-AA benchmark. Visit the Hugging Face blog to learn more and download the benchmark tools. This is a great way to see firsthand how AI measures up to real-world IT challenges.