9th ISCA Workshop on Speech Synthesis Workshop (SSW 9) 2016
DOI: 10.21437/ssw.2016-27
|View full text |Cite
|
Sign up to set email alerts
|

Investigating Very Deep Highway Networks for Parametric Speech Synthesis

Abstract: The depth of the neural network is a vital factor that affects its performance. Recently a new architecture called highway network was proposed. This network facilitates the training process of a very deep neural network by using gate units to control a information highway over the conventional hidden layer. For the speech synthesis task, we investigate the performance of highway networks with up to 40 hidden layers. The results suggest that a highway network with 14 non-linear transformation layers is the bes… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
13
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
3
2
2
1

Relationship

3
5

Authors

Journals

citations
Cited by 15 publications
(13 citation statements)
references
References 14 publications
0
13
0
Order By: Relevance
“…Recently, researchers have used NNs as alternatives to HMMs to jointly model the F 0 and other spectral features [22], [23], [24]. Some researchers use NNs exclusively for F 0 modeling [25], [12], which may be reasonable because it was recently found that NNs may prioritize the spectral features over the F 0 [26]. In fact, many NN-based F 0 models have been proposed before the advent of SPSS-based TTS systems [27], [28], [29].…”
Section: A Classical Modelsmentioning
confidence: 99%
“…Recently, researchers have used NNs as alternatives to HMMs to jointly model the F 0 and other spectral features [22], [23], [24]. Some researchers use NNs exclusively for F 0 modeling [25], [12], which may be reasonable because it was recently found that NNs may prioritize the spectral features over the F 0 [26]. In fact, many NN-based F 0 models have been proposed before the advent of SPSS-based TTS systems [27], [28], [29].…”
Section: A Classical Modelsmentioning
confidence: 99%
“…Highway networks [8], [9] are weighted skip-connections between layers, and they often connect hidden layers. Given that the input and output are often in the same domain (e.g., cepstrum) in VC, we propose a VC using highway networks connected from the input to output as follows:…”
Section: Vc Using Input-to-output Highway Networkmentioning
confidence: 99%
“…It is well known that the generated sequences of parameters from the HMMs are temporally smoothed, producing perceptual differences between synthetic and natural speech. There have been several attempts to improve the quality of synthesized speech, based on Deep Learning approaches: The first main approach is to substitute the HMM for deep neural networks (DNN) [7] [8] [9] [10], learning the map between linguistic specification directly to speech parameters. The second approach is to apply post-filters for the parameters generated by the HMMs [11] [12] [13].…”
Section: Introductionmentioning
confidence: 99%