2020
DOI: 10.1007/s11042-020-08838-1
|View full text |Cite
|
Sign up to set email alerts
|

A mapping model of spectral tilt in normal-to-Lombard speech conversion for intelligibility enhancement

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
4
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
3
1
1

Relationship

0
5

Authors

Journals

citations
Cited by 6 publications
(4 citation statements)
references
References 30 publications
0
4
0
Order By: Relevance
“…Shreyas Seshadri [20] used parallel DNN model instead of BGMM to map normal speech to lombard speech. In 2020, a mapping model based on linear prediction and tilt correction is proposed [16] . Compared with previous studies, the deep neural network is used to map high dimensions instead of Gaussian model, and tilt correction module is creatively added to further reduce the mapping error of formant amplitude.…”
Section: Transformation Methods Based On Feature Mappingmentioning
confidence: 99%
“…Shreyas Seshadri [20] used parallel DNN model instead of BGMM to map normal speech to lombard speech. In 2020, a mapping model based on linear prediction and tilt correction is proposed [16] . Compared with previous studies, the deep neural network is used to map high dimensions instead of Gaussian model, and tilt correction module is creatively added to further reduce the mapping error of formant amplitude.…”
Section: Transformation Methods Based On Feature Mappingmentioning
confidence: 99%
“…The data-driven approach uses a large amount of speech data to build a feature mapping model from normal speech to Lombard speech by machine learning algorithms to achieve speech style conversion (SSC). In recent years, Bayesian-Gaussian mixture models (BGMM) [20,28], deep neural networks (DNNs) [29,30], recurrent neural networks (RNNs) and their variants such as long-short-term memory (LSTM) [31] networks have been widely used for mapping acoustic features. However, current parallel SSC methods require a parallel corpus of source and target speech and usually require a temporal alignment operation on the training data, an operation that may lead to some feature distortions.…”
Section: Related Workmentioning
confidence: 99%
“…At present, DNN speech enhancement can be roughly divided into two methods. The first is to seek the mapping between noise and pure speech spectrum [16], and the other is based on masking [17]. DNN-based speech enhancement usually produces excessively smooth speech, resulting in speech distortion and loss of intelligibility, and large DNNS take up large memory and slow training speed [18].…”
Section: Introductionmentioning
confidence: 99%