ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
DOI: 10.1109/icassp39728.2021.9413634
|View full text |Cite
|
Sign up to set email alerts
|

Development of the Cuhk Elderly Speech Recognition System for Neurocognitive Disorder Detection Using the Dementiabank Corpus

Abstract: Early diagnosis of Neurocognitive Disorder (NCD) is crucial in facilitating preventive care and timely treatment to delay further progression. This paper presents the development of a state-of-the-art automatic speech recognition (ASR) system built on the Dementia-Bank Pitt corpus for automatic NCD detection. Speed perturbation based audio data augmentation expanded the limited elderly speech data by four times. Large quantities of out-of-domain, non-aged adult speech were exploited by cross-domain adapting a … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
24
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3
1

Relationship

3
5

Authors

Journals

citations
Cited by 25 publications
(24 citation statements)
references
References 35 publications
0
24
0
Order By: Relevance
“…Inspired by the spectro-temporal level differences between impaired speech and normal speech such as overall reduction of speech volume, changes in the spectral envelope shape, weakened formants and slower speaking rate, recent research in this direction has been largely focused on front-end signal processing based techniques including tempo-stretching [69], [70], VTLP [62], and speed perturbation [15], [63], of normal speech recorded from healthy control speakers. The resulting speech data exhibiting certain high-level attributes such as a slower speaking rate and reduced speech volume is then used to augment the limited dysarthric or elderly speech training data [14], [15], [33]. A range of speech augmentation approaches investigated for dysarthric speech recognition [15] suggest the combined use of personalized, speaker dependent (SD) together with speaker independent (SI) speed perturbation factors produces the largest performance improvements.…”
Section: Corpusmentioning
confidence: 99%
See 1 more Smart Citation
“…Inspired by the spectro-temporal level differences between impaired speech and normal speech such as overall reduction of speech volume, changes in the spectral envelope shape, weakened formants and slower speaking rate, recent research in this direction has been largely focused on front-end signal processing based techniques including tempo-stretching [69], [70], VTLP [62], and speed perturbation [15], [63], of normal speech recorded from healthy control speakers. The resulting speech data exhibiting certain high-level attributes such as a slower speaking rate and reduced speech volume is then used to augment the limited dysarthric or elderly speech training data [14], [15], [33]. A range of speech augmentation approaches investigated for dysarthric speech recognition [15] suggest the combined use of personalized, speaker dependent (SD) together with speaker independent (SI) speed perturbation factors produces the largest performance improvements.…”
Section: Corpusmentioning
confidence: 99%
“…based assistive technologies more natural alternatives [23], [24] even though speech quality is degraded. To this end, in recent years there has been increasing interest in developing ASR technologies that are suitable for dysarthric [9], [25]- [40] and elderly speech [14], [41]- [46].…”
Section: Introductionmentioning
confidence: 99%
“…A 4-gram language model based on the DementiaBank Pitt transcripts, Switchboard and Fisher transcripts and additional text data of 392.4 millions words from the Gigaword collection released by LDC (LDC2011T07) was used in decoding. More details can be found in [75]. The TDNN baseline system was trained with the DementiaBank Pitt data only and its performance was shown in line 7 of Tab.…”
Section: Experiments On Dementiabank Pitt Elderly Speechmentioning
confidence: 99%
“…This presents a necessity for a scalable approach in screening cognitive impairments among older adults. Current NCD screening and diagnosis tests, such as the Montreal Cognitive Assessment (MoCA) [60], are mainly conducted as in-person tests by clinical professionals [89,100]. In-person assessments face limitations due to various factors, such as limited accessibility to the tests due to the lower mobility of some older adults, and shortages and inter-rater variabilities of clinicians [50].…”
Section: Introductionmentioning
confidence: 99%
“…To overcome these limitations, researchers are working on alternative solutions for NCD screening. Since NCDs are often manifested as communicative impairments, machine learning (ML) algorithms are offering a new type of support for NCD screening [3,13,22,38,47,55,56,66,72,97,100]. Therefore, a potential solution is to integrate speech analytics into a conversational agent (CA) to support highly accessible voice-based web applications that can interact with older adults through spoken language for screening NCD.…”
Section: Introductionmentioning
confidence: 99%