Anchored Confabulation: How Partial Evidence Triggers Confident Hallucinations in LLMs

Researchers discovered that providing one confirmed fact in a multi-step reasoning chain increases LLMs' likelihood of confidently producing wrong answers. This phenomenon, called anchored confabulation, challenges assumptions about how models handle partial evidence.

A new study on arXiv identifies a previously unknown property of large language models (LLMs): providing one confirmed intermediate fact in a multi-step reasoning chain increases the model's tendency to produce confidently wrong answers before full evidence is presented. The researchers term this phenomenon 'anchored confabulation,' where a partial anchor commits the model to confidently completing the remaining reasoning steps parametrically.

The study formalizes this behavior as Parametric Hallucination Confidence (PHC) and establishes it across six lines of evidence, including a causal injection experiment. The findings reveal that even a single confirmed fact can lead models to generate incorrect but confident responses, challenging the assumption that partial evidence improves reasoning accuracy. This has significant implications for applications relying on multi-step reasoning, such as legal analysis, medical diagnosis, and complex decision-making.

The research suggests that developers must rethink how models handle partial evidence. Future work may focus on mitigating anchored confabulation by improving calibration mechanisms or developing new architectures that handle partial information more robustly. The study also raises questions about the reliability of LLMs in high-stakes scenarios where partial evidence is common.