2022
DOI: 10.1016/j.cmpb.2021.106602
|View full text |Cite
|
Sign up to set email alerts
|

Phonetic posteriorgram-based voice conversion system to improve speech intelligibility of dysarthric patients

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
2
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
7
1

Relationship

1
7

Authors

Journals

citations
Cited by 18 publications
(9 citation statements)
references
References 10 publications
0
2
0
Order By: Relevance
“…Recent methods based on the use of linguistic features PPGs and vocoders have also been proposed and have proven to be effective [10,[23][24][25][26]28]. PPGs are high-level contextual representations obtained from the posterior probabilities of each phonetic class using a speaker-independent ASR system.…”
Section: Related Workmentioning
confidence: 99%
“…Recent methods based on the use of linguistic features PPGs and vocoders have also been proposed and have proven to be effective [10,[23][24][25][26]28]. PPGs are high-level contextual representations obtained from the posterior probabilities of each phonetic class using a speaker-independent ASR system.…”
Section: Related Workmentioning
confidence: 99%
“…Fig. 1 depicts the proposed DVC 3.1 system, which was obtained by modifying the system proposed in our previous study [16]. DVC 3.1 involves four stages: data augmentation, speaker-dependent automatic speech recognition (SD-ASR) [25] training, conversion model training, and conversion.…”
Section: A Proposed Architecturementioning
confidence: 99%
“…However, there is still room for improvement in terms of the temporal variability and instability of patients' speech. We recently proposed a deep learning-based DVC system (DVC 3.0) [16]. The system addresses the variability of patients' speech characteristics using the speaker-independent property of phonetic posteriorgrams (PPGs) [17,18] and converts phonemes into normal speech using a gated convolutional neural network model (gated CNN) with long-term memory effects.…”
mentioning
confidence: 99%
“…Other voice conversion methods include text-based approaches. The best-known text-based approach uses automatic speech recognition (ASR) models to extract phonetic posteriograms (PPGs), which are then used as linguistic information [23,24]. Text-based approaches that use ASR models have accurate linguistic information, are unlikely to be corrupted during voice conversion, and can even perform voice conversion between speakers of different languages if the ASR model used supports multiple languages.…”
Section: Introductionmentioning
confidence: 99%