New Benchmark Tests AI Search Agents' Ability to Ask Better Questions

Researchers created a new benchmark called DiscoBench to evaluate how well AI search tools ask clarifying questions when users' requests are unclear, vague, or even factually incorrect. This helps AI assistants avoid cascading errors in multi-step searches.

Researchers have released a new benchmark called DiscoBench to test how well AI search agents handle unclear, incomplete, or even factually incorrect user requests. These AI tools, powered by large language models (LLMs), often assume user queries are perfectly clear, but real-world searches are frequently vague or underspecified. DiscoBench evaluates whether AI search agents can recognize when they need more information and ask clarifying questions to improve their search results.

This matters because AI search tools are becoming more common in everyday life, from helping with research to answering complex questions. In deep search scenarios—where the AI must perform multiple steps of retrieval and reasoning—ambiguity can propagate along the reasoning chain and lead agents toward incorrect search trajectories. When an AI assistant can ask for clarification, it can avoid these cascading mistakes and provide more accurate answers. For example, if you ask an AI assistant to find information about a book but don't specify the title or author, a clarification-aware tool can ask follow-up questions to narrow down the search.