Self-Calibrating LLMs via Test-Time Discriminative Distillation

Researchers propose a method to improve the calibration of large language models (LLMs) without labeled data. The technique leverages the models' internal signals to reduce overconfidence in answers.

Researchers have introduced a novel approach to enhance the calibration of large language models (LLMs) during inference. The method, termed test-time discriminative distillation, addresses the persistent issue of LLMs being overconfident in their responses, often expressing high certainty in incorrect answers. Unlike existing methods that require labeled validation data or incur significant inference costs, this technique operates without the need for additional labeled data, making it more practical for real-world applications.

The key insight is that LLMs already contain a better-calibrated signal than the one they verbalize. Specifically, the token probability of "True" when the model is asked "Is this answer correct?" ($P(\text{True})$) consistently outperforms their stated confidence. This gap highlights the potential for self-calibration using the models' internal mechanisms. The proposed method leverages this signal to adjust the model's confidence levels dynamically, reducing overconfidence and improving reliability.

The implications of this research are significant for applications requiring high accuracy and reliability, such as medical diagnosis, legal advice, and financial forecasting. By improving calibration, the method can enhance user trust and the overall effectiveness of LLM-based systems. Future work may explore integrating this approach with other calibration techniques to further refine model performance. The open questions revolve around the scalability of the method and its effectiveness across different types of distribution shifts.