AI Scoring System Reveals How Good Our Explanations Really Are

Researchers developed a way to measure the quality of written explanations using AI. They analyzed over 55,000 predictions from a forecasting tournament and found patterns that help assess how well people justify their decisions. This could improve how we evaluate expert opinions and AI reasoning alike.

Researchers from a forecasting tournament introduced Explanation Quality Markers (EQMs), a new AI-based system to evaluate the quality of written explanations. The system analyzes natural-language rationales paired with probabilistic judgments, scoring them against real outcomes. EQMs use sixty theory-guided reasoning patterns identified by large language models (LLMs) to assess how well explanations hold up. The analysis was pre-registered and covered over 55,000 forecast-rationale pairs from a multiyear forecasting tournament.

This matters because it helps us trust expert judgments more. Whether it's a doctor explaining a diagnosis or an AI justifying a decision, knowing if the explanation is solid makes a big difference. Imagine getting a clear score for how well your boss justifies a promotion decision—this could bring that kind of transparency to many areas.

You can explore the full research paper on arXiv to see how EQMs work and what patterns they identify. Just search for 'Measuring Judgment Quality in Natural-Language Explanations' on the arXiv website to dive into the details.