AI Detects Depression from Routine Primary Care Conversations with 85% Accuracy

Researchers analyzed 1,108 audio-recorded primary care visits to train AI models that detect depression from naturalistic dialogue. The best-performing model, combining Sentence-BERT with logistic regression, achieved high accuracy in identifying patients with PHQ-9 confirmed depression.

A new study published on arXiv introduces an automated system capable of detecting depression directly from audio recordings of routine primary care visits. The research team analyzed 1,108 encounters from the Establishing Focus study, utilizing the PHQ-9 questionnaire as the ground truth to label 253 patients as depressed and 855 as non-depressed. By comparing three supervised approaches—Sentence-BERT paired with logistic regression, LIWC features with logistic regression, and a modern transformer-based model—the study demonstrates that linguistic signals embedded in everyday doctor-patient dialogue can serve as reliable biomarkers for mental health screening.

This development is significant because depression remains notoriously underdiagnosed in primary care settings, often due to limited appointment time and the subjective nature of self-reporting. Traditional screening relies heavily on patients voluntarily disclosing symptoms or completing paper forms, which can be missed or delayed. By leveraging the growing adoption of digital scribing technologies that record clinical conversations, this AI approach offers a passive, non-intrusive method to flag at-risk patients in real-time. It transforms routine administrative data into a critical clinical tool, potentially bridging the gap between patient distress and professional intervention without adding extra burden to the visit.

While the study highlights the promise of using natural language processing for mental health, several questions remain regarding deployment and ethics. The authors note the need for rigorous validation across diverse demographics to ensure the models do not perpetuate biases present in the training data. Furthermore, the integration of such systems into Electronic Health Records (EHR) will require careful navigation of patient privacy, informed consent, and the potential for false positives to cause unnecessary anxiety. As digital scribing becomes standard, the next phase will involve clinical trials to determine if these automated alerts actually improve diagnosis rates and patient outcomes in real-world practice.