2018 IEEE 23rd International Conference on Digital Signal Processing (DSP) 2018
DOI: 10.1109/icdsp.2018.8631888
|View full text |Cite
|
Sign up to set email alerts
|

Whisper to Normal Speech Based on Deep Neural Networks with MCC and F0 Features

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(4 citation statements)
references
References 14 publications
0
4
0
Order By: Relevance
“…Some version of a dataset derived from the TIMIT corpus was used in all cases included here. Specifically, whispered TIMIT (wTimit) 1 was used by Niranjan et al (2020), Patel et al (2021), Parmar et al (2019), andPatel et al (2019), while CSTR-NAM-TIMIT Plus was used by Gao et al (2021), Lian et al (2019a), Malaviya et al (2020), Pang et al (2020), Yu et al (2019), and Lian et al (2019b). The wTIMIT dataset uses the prompts in TIMIT, a well-known corpus often used for benchmarking in speech recognition, including 450 phonetically balanced sentences both in normal and whispered speech.…”
Section: Datasetsmentioning
confidence: 99%
See 2 more Smart Citations
“…Some version of a dataset derived from the TIMIT corpus was used in all cases included here. Specifically, whispered TIMIT (wTimit) 1 was used by Niranjan et al (2020), Patel et al (2021), Parmar et al (2019), andPatel et al (2019), while CSTR-NAM-TIMIT Plus was used by Gao et al (2021), Lian et al (2019a), Malaviya et al (2020), Pang et al (2020), Yu et al (2019), and Lian et al (2019b). The wTIMIT dataset uses the prompts in TIMIT, a well-known corpus often used for benchmarking in speech recognition, including 450 phonetically balanced sentences both in normal and whispered speech.…”
Section: Datasetsmentioning
confidence: 99%
“…A higher MCD value indicates a greater difference between the converted and the reference speech (lower is better). Additionally, Short-Time Objective Intelligibility (STOI) was also used in Lian et al (2019a), Pang et al (2020), Yu et al (2019), andLian et al (2019b) and Perceptual Evaluation of Speech Quality was also used in Lian et al (2019a), Pang et al (2020), andYu et al (2019). Finally, the two remaining papers used entirely different metrics from the rest.…”
Section: Objective Metricsmentioning
confidence: 99%
See 1 more Smart Citation
“…Voice conversion (VC) is a technique for converting one voice into another voice with a different timbre under the condition of keeping linguistic information. In recent years, there have been many applications based on voice conversion, such as vocal conversion [1][2][3][4][5][6][7][8][9][10][11][12][13], singing voice conversion [14][15][16], emotion conversion [17,18], speech style conversion [19,20], conversion of whispers to normal voices [21,22], conversion of singing skills [23], voice correction [24], and so on. However, there are few studies on the conversion of human voices to musical instruments.…”
Section: Introductionmentioning
confidence: 99%