2021
DOI: 10.1109/access.2021.3115608
|View full text |Cite
|
Sign up to set email alerts
|

Linguistic-Coupled Age-to-Age Voice Translation to Improve Speech Recognition Performance in Real Environments

Abstract: We address a low-performance problem of the elderly in automatic speech recognition (ASR) through feature adaptation agnostic to the ASR. Most of the datasets for speech recognition models consist of datasets collected from adult speakers. Consequently, the majority of commercial speech recognition systems typically tend to perform well on adult speakers. In other words, the limited diversity of speakers in the training datasets yields unreliable performance for minority (e.g., elderly) speakers due to the inf… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(7 citation statements)
references
References 32 publications
0
4
0
Order By: Relevance
“…MT system has been developed to include a voice system for the translated terms (Kim et al, 2021). This significant property assists users to learn words' spelling easily; especially it can be applied in dual mode (i.e.…”
Section: Translation Voice Accompanied With Translated Termsmentioning
confidence: 99%
See 1 more Smart Citation
“…MT system has been developed to include a voice system for the translated terms (Kim et al, 2021). This significant property assists users to learn words' spelling easily; especially it can be applied in dual mode (i.e.…”
Section: Translation Voice Accompanied With Translated Termsmentioning
confidence: 99%
“…Compared with a manual translation system, MT system needs less time and effort the user to start the translation process (Filmer, 2019;Kim et al, 2021). On the contrary, the manual translation needs the user to prepare a dictionary and search manually and alphabetically to find the target term.…”
Section: Ease Of Access and Little Effort Requiredmentioning
confidence: 99%
“…The evaluation data highlighted tentative guesses at the age of the presenters and all four lectures received neutral scores for this category although one evaluator was more confident that they could judge the age of two presenters as being between 40-50 years old. Research into the effect of age on the voice tends to have been linked to an older population as described by Kim et al [9]. The authors discuss the concept of a voice conversion framework coupled with linguistic information that may help to reduce issues of bias where the voice files used to generate data sets are mainly from younger adults.…”
Section: Findings and Discussionmentioning
confidence: 99%
“…In Figure 2a, we show an example of a Mel-spectrogram feature. These types of transforms allow us to handle the original waveform by extracting useful features and achieve human-level performance in various speech classification tasks [4], [17]- [19].…”
Section: Speech Classification Systems Transform An Audio Waveformmentioning
confidence: 99%