Discerning mental illness with voice

Psyrin has developed an automated speech analysis system that can distinguish between mental health conditions with notable accuracy. Published in Translational Psychiatry, this research was led by Dr. Julianna Olah, Psyrin's CTO, representing a significant advancement in progress towards digital mental health diagnostics.

Our study addresses critical gaps in psychiatric assessment by demonstrating that sophisticated machine learning algorithms can analyze brief speech samples to identify subtle differences between conditions on the psychosis spectrum. Using a large dataset of 1140 participants and over 22,000 minutes of speech, we tested whether our system could differentiate between healthy controls (HC), individuals with subclinical psychotic experiences (SPE), bipolar disorder (BD), schizophrenia spectrum disorders (SSD), and major depressive disorder (MDD). The results show unprecedented classification accuracy across multiple diagnostic categories, tasks that traditionally require lengthy clinical interviews by experts.

Voice analysis has emerged as a particularly promising approach for mental health assessment because speech contains rich information about cognitive, emotional, and motor function, all domains affected by psychiatric conditions. Changes in vocal expression and poverty of speech are hallmark features of psychotic disorders, while alterations in prosody, rhythm, and speech content can signal affective conditions. What makes voice especially valuable is its accessibility - it can be captured remotely using common devices, doesn't require specialized equipment, and provides objective data that complements traditional clinical assessments.

Our methodology prioritized scalability and real-world applicability. Participants completed a series of voice recording tasks online via their own devices. We then extracted 150+ unique features from these recordings, including linguistic and paralinguistic parameters. Importantly, our testing determined that prompted speech tasks proved more informative than text reading, allowing us to optimize a brief 5-minute assessment protocol.

What makes our work particularly distinctive is our focus on multi-class classification rather than the simpler binary classifications (e.g., disorder vs. healthy control) that dominate existing research. This approach better resembles real clinical scenarios where clinicians must differentiate between several possible diagnoses with overlapping symptoms. We validated our models using rigorous train-test splits and cross-validation to ensure reliable performance measurement.

The results exceeded expectations across multiple classification tasks. Using just 5 minutes of speech with demographic information, our model distinguished between healthy controls and serious mental illness (SSD or BD) with an impressive 0.91 AUC. More remarkably, when classifying across four categories simultaneously (HC, SPE, BD, SSD), the model maintained 86% accuracy with balanced performance across all groups. Even when expanded to five categories including MDD, accuracy remained strong at 76%, significantly outperforming chance levels and demonstrating that our system can capture nuanced, disorder-specific speech patterns rather than simply identifying general distress.

The clinical implications of this technology are substantial. Our system could potentially accelerate the diagnostic process, which currently can take months or years, particularly for conditions on the psychosis spectrum. By providing an objective, standardized assessment tool that can be deployed remotely, we could significantly improve access to care, especially in underserved regions with limited psychiatric specialists.

While these results are promising, we recognize that further validation in diverse clinical populations is essential. We are currently conducting prospective studies to validate these models in patients with clinically confirmed diagnoses across multiple sites. Our vision is to create a comprehensive digital assessment platform that combines speech analysis with other digital biomarkers to support clinicians throughout the care continuum. Watch this space!