2020
DOI: 10.1016/j.neunet.2020.01.034
|View full text |Cite
|
Sign up to set email alerts
|

On the localness modeling for the self-attention based end-to-end speech synthesis

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
21
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
2
2
1

Relationship

0
10

Authors

Journals

citations
Cited by 42 publications
(22 citation statements)
references
References 11 publications
1
21
0
Order By: Relevance
“…While the weights theoretically can distribute over all input positions, they are often concentrated locally, particularly with output position i focusing on input position i. Previous works on various sequence tasks (Yang et al, 2020;Zhang et al, 2020b) have shown heavy weights on the diagonal of the encoder selfattention matrices.…”
Section: Position-based Self-attention Querymentioning
confidence: 99%
“…While the weights theoretically can distribute over all input positions, they are often concentrated locally, particularly with output position i focusing on input position i. Previous works on various sequence tasks (Yang et al, 2020;Zhang et al, 2020b) have shown heavy weights on the diagonal of the encoder selfattention matrices.…”
Section: Position-based Self-attention Querymentioning
confidence: 99%
“…Recurrent neural networks (RNN) have been proved to excel at various sequential tasks, such as speech recognition [79], speech synthesis [80], handwriting recognition [81], and image to text [82]. Particularly, Long Short-Term Memory (LSTM) layers [83], transformers and self-attention mechanism [84] are the robust architecture for modelling long range sequence data with auto correlations like time series data, natural languages etc.…”
Section: Figure 1: Distribution Of Different Type Of Datasets (A) Datmentioning
confidence: 99%
“…Recurrent neural networks (RNN) have been proved to excel at various sequential tasks, such as speech recognition [80], speech synthesis [81], handwriting recognition [82], and image to text [83]. Particularly, Long Short-Term Memory (LSTM) layers [84], transformers and self-attention mechanism [85] Van den Oord et al [69] designed two variants of recurrent image models: PixelRNN and PixelCNN.…”
Section: Introductionmentioning
confidence: 99%