Voice Analysis for Neurological Disorder Recognition–A Systematic Review and Perspective on Emerging Trends

Hecker, Pascal; Steckhan, Nico; Eyben, Florian; Schuller, Björn W.; Arnrich, Bert

doi:10.3389/fdgth.2022.842301

Cited by 24 publications

(14 citation statements)

References 94 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Studies have found differences between the voices of depressed and healthy people ( 22 ), and acoustic features are correlate with the severity of depressive symptoms and their variability ( 23 ). Several acoustic features are thought to correlate with depression ( 24 ), Machine Learning and Deep Learning techniques have been widely used in the studies of voice analysis ( 25 ). For the audio recordings in this study, we also performed some voice analysis.…”

Section: Discussionmentioning

confidence: 99%

Using deeply time-series semantics to assess depressive symptoms based on clinical interview speech

Feng²,

Hu³

et al. 2023

Front. Psychiatry

View full text Add to dashboard Cite

IntroductionDepression is an affective disorder that contributes to a significant global burden of disease. Measurement-Based Care (MBC) is advocated during the full course management, with symptom assessment being an important component. Rating scales are widely used as convenient and powerful assessment tool, but they are influenced by the subjectivity and consistency of the raters. The assessment of depressive symptoms is usually conducted with a clear purpose and restricted content, such as clinical interviews based on the Hamilton Depression Rating Scale (HAMD), so that the results are easy to obtain and quantify. Artificial Intelligence (AI) techniques are used due to their objective, stable and consistent performance, and are suitable for assessing depressive symptoms. Therefore, this study applied Deep Learning (DL)-based Natural Language Processing (NLP) techniques to assess depressive symptoms during clinical interviews; thus, we proposed an algorithm model, explored the feasibility of the techniques, and evaluated their performance.MethodsThe study included 329 patients with Major Depressive Episode. Clinical interviews based on the HAMD-17 were conducted by trained psychiatrists, whose speech was simultaneously recorded. A total of 387 audio recordings were included in the final analysis. A deeply time-series semantics model for the assessment of depressive symptoms based on multi-granularity and multi-task joint training (MGMT) is proposed.ResultsThe performance of MGMT is acceptable for assessing depressive symptoms with an F1 score (a metric of model performance, the harmonic mean of precision and recall) of 0.719 in classifying the four-level severity of depression and an F1 score of 0.890 in identifying the presence of depressive symptoms.DisscussionThis study demonstrates the feasibility of the DL and the NLP techniques applied to the clinical interview and the assessment of depressive symptoms. However, there are limitations to this study, including the lack of adequate samples, and the fact that using speech content alone to assess depressive symptoms loses the information gained through observation. A multi-dimensional model combing semantics with speech voice, facial expression, and other valuable information, as well as taking into account personalized information, is a possible direction in the future.

show abstract

Section: Discussionmentioning

confidence: 99%

Using deeply time-series semantics to assess depressive symptoms based on clinical interview speech

Feng²,

Hu³

et al. 2023

Front. Psychiatry

View full text Add to dashboard Cite

show abstract

“…., t N . For the squared error loss L(y i , γ l (t i )) = L(y i − γ l (t i )) 2 the solution corresponds to the natural polynomial spline, see discussion in [64]. Hence, we have been able to motivate the spline representation of the IMF as the solution to a generalised estimation problem in an RKHS regularised function space.…”

Section: Spline Representations Of An Imf and Reproducing Kernel Hilb...mentioning

confidence: 96%

“…These are repeating syllables, spontaneous dialogue, improvised description of a figure, etc. [ 2 ]. An ASR system can use several speech features descriptive of the different phases of speech production process, extensively reviewed by [ 2 , 4 , 21 , 22 ].…”

Section: Introductionmentioning

confidence: 99%

“…[ 2 ]. An ASR system can use several speech features descriptive of the different phases of speech production process, extensively reviewed by [ 2 , 4 , 21 , 22 ]. Amongst many, acoustic or vocal tract features describing the articulatory phase are the ones that correlate the most with neurodegenerative disorders.…”

Section: Introductionmentioning

confidence: 99%

“…Furthermore, the same symptoms manifest in multiple of these conditions [1], demanding expensive equipment and advanced expertise for the correct diagnosis. As a solution, the developments of artificial intelligence in biotechnology have started to support these medical settings with automated computational tools that can increasingly identify disorders' abnormalities in real-life-sensing environments [2][3][4][5]. The challenge in detecting symptoms of such nervous system disorders through a computerised practice is accomplished via several modalities (such as speech, handwriting, radiology, gait, etc.)…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Ataxic speech disorders and Parkinson’s disease diagnostics via stochastic embedding of empirical mode decomposition

2023

View full text Add to dashboard Cite

Medical diagnostic methods that utilise modalities of patient symptoms such as speech are increasingly being used for initial diagnostic purposes and monitoring disease state progression. Speech disorders are particularly prevalent in neurological degenerative diseases such as Parkinson’s disease, the focus of the study undertaken in this work. We will demonstrate state-of-the-art statistical time-series methods that combine elements of statistical time series modelling and signal processing with modern machine learning methods based on Gaussian process models to develop methods to accurately detect a core symptom of speech disorder in individuals who have Parkinson’s disease. We will show that the proposed methods out-perform standard best practices of speech diagnostics in detecting ataxic speech disorders, and we will focus the study, particularly on a detailed analysis of a well regarded Parkinson’s data speech study publicly available making all our results reproducible. The methodology developed is based on a specialised technique not widely adopted in medical statistics that found great success in other domains such as signal processing, seismology, speech analysis and ecology. In this work, we will present this method from a statistical perspective and generalise it to a stochastic model, which will be used to design a test for speech disorders when applied to speech time series signals. As such, this work is making contributions both of a practical and statistical methodological nature.

show abstract

Storyteller in ADNI4: Application of an early Alzheimer's disease screening tool using brief, remote, and speech‐based testing

Skirrow,

Meepegama,

Weston

et al. 2024

Alzheimer's & Dementia

View full text Add to dashboard Cite

INTRODUCTIONSpeech‐based testing shows promise for sensitive and scalable objective screening for Alzheimer's disease (AD), but research to date offers limited evidence of generalizability.METHODSData were taken from the AMYPRED (Amyloid Prediction in Early Stage Alzheimer's Disease from Acoustic and Linguistic Patterns of Speech) studies (N = 101,N = 46 mild cognitive impairment [MCI]) and Alzheimer's Disease Neuroimaging Initiative 4 (ADNI4) remote digital (N = 426,N = 58 self‐reported MCI, mild AD or dementia) and in‐clinic (N = 57,N = 13 MCI) cohorts, in which participants provided audio‐recorded responses to automated remote story recall tasks in the Storyteller test battery. Text similarity, lexical, temporal, and acoustic speech feature sets were extracted. Models predicting early AD were developed in AMYPRED and tested out of sample in the demographically more diverse cohorts in ADNI4 (> 33% from historically underrepresented populations).RESULTSSpeech models generalized well to unseen data in ADNI4 remote and in‐clinic cohorts. The best‐performing models evaluated text‐based metrics (text similarity, lexical features: area under the curve 0.71–0.84 across cohorts).DISCUSSIONSpeech‐based predictions of early AD from Storyteller generalize across diverse samples.HighlightsThe Storyteller speech‐based test is an objective digital prescreener for Alzheimer's Disease Neuroimaging Initiative 4 (ADNI4).Speech‐based models predictive of Alzheimer's disease (AD) were developed in the AMYPRED (Amyloid Prediction in Early Stage Alzheimer's Disease from Acoustic and Linguistic Patterns of Speech) sample (N = 101).Models were tested out of sample in ADNI4 in‐clinic (N = 57) and remote (N = 426) cohorts.Models showed good generalization out of sample.Models evaluating text matching and lexical features were most predictive of early AD.

show abstract

Voice Analysis for Neurological Disorder Recognition–A Systematic Review and Perspective on Emerging Trends

Cited by 24 publications

References 94 publications

Using deeply time-series semantics to assess depressive symptoms based on clinical interview speech

Using deeply time-series semantics to assess depressive symptoms based on clinical interview speech

Ataxic speech disorders and Parkinson’s disease diagnostics via stochastic embedding of empirical mode decomposition

Storyteller in ADNI4: Application of an early Alzheimer's disease screening tool using brief, remote, and speech‐based testing

Contact Info

Product

Resources

About