Improving Post-Filtering of Artificial Speech Using Pre-Trained LSTM Neural Networks

Coto-Jiménez, Marvin

doi:10.20944/preprints201905.0228.v1

2019

DOI: 10.20944/preprints201905.0228.v1

|View full text |Cite

Preprint

Improving Post-Filtering of Artificial Speech Using Pre-Trained LSTM Neural Networks

Marvin Coto-Jiménez

Abstract: Several researchers have contemplated deep learning-based post-filters to increase the quality of statistical parametric speech synthesis, which perform a mapping of the synthetic speech to the natural speech, considering the different parameters separately and trying to reduce the gap between them. The Long Short-term Memory (LSTM) Neural Networks have been applied successfully in this purpose, but there are still many aspects to improve in the results and in the process itself. In this paper, we introduce a … Show more

Help me understand this report

View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

Supporting

Mentioning

Contrasting

Year Published

2020

2024

Publication Types

Select...

Article5

Relationship

Self Cite0

Independent5

Authors

Journals

Cited by 6 publications

References 31 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

Automated Detection of Sleep Stages Using Deep Learning Techniques: A Systematic Review of the Last Decade (2010–2020)

et al. 2020

View full text Add to dashboard Cite

Sleep is vital for one’s general well-being, but it is often neglected, which has led to an increase in sleep disorders worldwide. Indicators of sleep disorders, such as sleep interruptions, extreme daytime drowsiness, or snoring, can be detected with sleep analysis. However, sleep analysis relies on visuals conducted by experts, and is susceptible to inter- and intra-observer variabilities. One way to overcome these limitations is to support experts with a programmed diagnostic tool (PDT) based on artificial intelligence for timely detection of sleep disturbances. Artificial intelligence technology, such as deep learning (DL), ensures that data are fully utilized with low to no information loss during training. This paper provides a comprehensive review of 36 studies, published between March 2013 and August 2020, which employed DL models to analyze overnight polysomnogram (PSG) recordings for the classification of sleep stages. Our analysis shows that more than half of the studies employed convolutional neural networks (CNNs) on electroencephalography (EEG) recordings for sleep stage classification and achieved high performance. Our study also underscores that CNN models, particularly one-dimensional CNN models, are advantageous in yielding higher accuracies for classification. More importantly, we noticed that EEG alone is not sufficient to achieve robust classification results. Future automated detection systems should consider other PSG recordings, such as electroencephalogram (EEG), electrooculogram (EOG), and electromyogram (EMG) signals, along with input from human experts, to achieve the required sleep stage classification robustness. Hence, for DL methods to be fully realized as a practical PDT for sleep stage scoring in clinical applications, inclusion of other PSG recordings, besides EEG recordings, is necessary. In this respect, our report includes methods published in the last decade, underscoring the use of DL models with other PSG recordings, for scoring of sleep stages.

show abstract

Automated Detection of Sleep Stages Using Deep Learning Techniques: A Systematic Review of the Last Decade (2010–2020)

et al. 2020

View full text Add to dashboard Cite

show abstract

Prediction and Comparison of In-Vehicle CO2 Concentration Based on ARIMA and LSTM Models

Han,

Lin,

Qin

2023

Applied Sciences

View full text Add to dashboard Cite

An increase in the carbon dioxide (CO2) concentration within a vehicle can lead to a decrease in air quality, resulting in numerous adverse effects on the human body. Therefore, it is very important to know the in-vehicle CO2 concentration level and to accurately predict a concentration change. The purpose of this research is to investigate in-vehicle concentration levels of CO2, comparing the accuracy of an autoregressive integrated moving average (ARIMA) model and a long short-term memory (LSTM) model in predicting the change in CO2 concentration. We conducted a field test to obtain in-vehicle original concentration data of CO2 while driving, establishing a prediction model of CO2 concentration with ARIMA and LSTM. We selected mean absolute percentage error (MAPE) and root mean squared error (RMSE) as the evaluation indicators. The findings indicate the following: (1) With the vehicle windows closed and recirculation ventilation mode activated, in-vehicle CO2 concentration increases rapidly. During testing, CO2 accumulation rates were measured at 1.43 ppm/s for one occupant and 3.52 ppm/s for three occupants within a 20 min driving period. Average concentrations exceeded 1000 ppm, so it is recommended to improve ventilation promptly while driving. (2) The MAPE of ARIMA and LSTM prediction results are 0.46% and 0.56%, respectively. The RMSE results are 19.62 ppm and 22.76 ppm, respectively. The prediction results demonstrate that both models effectively forecast changes in a vehicle’s interior environment CO2, but the prediction accuracy of ARIMA is better than that of LSTM. The research findings provide theoretical guidance to traffic safety managers in selecting suitable models for predicting in-vehicle CO2 concentrations and establish an effective in-vehicle ventilation warning control system.

show abstract

Discriminative Multi-Stream Postfilters Based on Deep Learning for Enhancing Statistical Parametric Speech Synthesis

Coto-Jiménez

2021

Biomimetics

View full text Add to dashboard Cite

Statistical parametric speech synthesis based on Hidden Markov Models has been an important technique for the production of artificial voices, due to its ability to produce results with high intelligibility and sophisticated features such as voice conversion and accent modification with a small footprint, particularly for low-resource languages where deep learning-based techniques remain unexplored. Despite the progress, the quality of the results, mainly based on Hidden Markov Models (HMM) does not reach those of the predominant approaches, based on unit selection of speech segments of deep learning. One of the proposals to improve the quality of HMM-based speech has been incorporating postfiltering stages, which pretend to increase the quality while preserving the advantages of the process. In this paper, we present a new approach to postfiltering synthesized voices with the application of discriminative postfilters, with several long short-term memory (LSTM) deep neural networks. Our motivation stems from modeling specific mapping from synthesized to natural speech on those segments corresponding to voiced or unvoiced sounds, due to the different qualities of those sounds and how HMM-based voices can present distinct degradation on each one. The paper analyses the discriminative postfilters obtained using five voices, evaluated using three objective measures, Mel cepstral distance and subjective tests. The results indicate the advantages of the discriminative postilters in comparison with the HTS voice and the non-discriminative postfilters.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Improving Post-Filtering of Artificial Speech Using Pre-Trained LSTM Neural Networks

Cited by 6 publications

References 31 publications

Automated Detection of Sleep Stages Using Deep Learning Techniques: A Systematic Review of the Last Decade (2010–2020)

Automated Detection of Sleep Stages Using Deep Learning Techniques: A Systematic Review of the Last Decade (2010–2020)

Prediction and Comparison of In-Vehicle CO2 Concentration Based on ARIMA and LSTM Models

Discriminative Multi-Stream Postfilters Based on Deep Learning for Enhancing Statistical Parametric Speech Synthesis

Contact Info

Product

Resources

About