2020
DOI: 10.1109/taslp.2020.2971417
|View full text |Cite
|
Sign up to set email alerts
|

Glottal Flow Synthesis for Whisper-to-Speech Conversion

Abstract: Whisper-to-speech conversion is motivated by laryngeal disorders, in which malfunction of the vocal folds leads to loss of voicing. Many patients with laryngeal disorders can still produce functional whispers, since these are characterised by the absence of vocal fold vibration. Whispers therefore constitute a common ground for speech rehabilitation across many kinds of laryngeal disorder. Whisper-to-speech conversion involves recreating natural-sounding speech from recorded whispers, and is a non-invasive and… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
15
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
1

Relationship

1
4

Authors

Journals

citations
Cited by 13 publications
(15 citation statements)
references
References 56 publications
(144 reference statements)
0
15
0
Order By: Relevance
“…The spectral tilt of the glottal source is computed over several glottal periods (including both glottal open and closed phases), and therefore the IAIF method does not call for the extraction of glottal closure instants (GCIs). During the past two decades, IAIF has been used in many areas, such as in parametric speech synthesis ([52] [53] [54] [55]), speaking style conversion [56], the detection of stress [57], and depression [58], as well as in emotion recognition [59] [60]. For a detailed description of the IAIF method, the reader is referred to Section II-B in the work of Raitio et al [52].…”
Section: B Baseline Featuresmentioning
confidence: 99%
“…The spectral tilt of the glottal source is computed over several glottal periods (including both glottal open and closed phases), and therefore the IAIF method does not call for the extraction of glottal closure instants (GCIs). During the past two decades, IAIF has been used in many areas, such as in parametric speech synthesis ([52] [53] [54] [55]), speaking style conversion [56], the detection of stress [57], and depression [58], as well as in emotion recognition [59] [60]. For a detailed description of the IAIF method, the reader is referred to Section II-B in the work of Raitio et al [52].…”
Section: B Baseline Featuresmentioning
confidence: 99%
“…In the presence of noise in the glottis signal, we showed that F GF captures the position of the dominant frequency region of this noise. 21 Therefore, in the control group we observe an F GF distribution around a few hundred Hertz, the order of magnitude of vocal fold vibration. The increase of F GF with the degree of impairment follows the increasing amount of noise in the glottal signal, linked to the progressive loss of phonation.…”
Section: Effect Of Subject Groupmentioning
confidence: 79%
“…Note that the GFM-IAIF glottis filter is fully causal compared to Equation 1, yet it does not affect the magnitude spectrum, from which are extracted the spectral parameters (eg, F GF , B GF , and F ST ) that we now use for analysis of dysphonic speech. GFM-IAIF has also recently been demonstrated in the conversion of postlaryngectomy speech to phonated speech, 21 lending empirical support to its effectiveness at modeling dysfunctional speech.…”
Section: Source-filter Decomposition Methodsmentioning
confidence: 94%
See 2 more Smart Citations