ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020
DOI: 10.1109/icassp40776.2020.9053355
|View full text |Cite
|
Sign up to set email alerts
|

GraphTTS: Graph-to-Sequence Modelling in Neural Text-to-Speech

Abstract: This paper leverages the graph-to-sequence method in neural text-to-speech (GraphTTS), which maps the graph embedding of the input sequence to spectrograms. The graphical inputs consist of node and edge representations constructed from input texts. The encoding of these graphical inputs incorporates syntax information by a GNN encoder module. Besides, applying the encoder of GraphTTS as a graph auxiliary encoder (GAE) can analyse prosody information from the semantic structure of texts. This can remove the man… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
9
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 13 publications
(9 citation statements)
references
References 12 publications
0
9
0
Order By: Relevance
“…The relational gated graph network (RGGN) aims to learn the semantic representation from the dependency graph in section 2.1. The gated graph neural network (GGNN) is adopted due to its long-term propagation of information flow [23]. Additionally, different weight matrices corresponding to different types of edges are introduced to GGNN inspired by relational graph convolutional network [26].…”
Section: Relational Gated Graph Networkmentioning
confidence: 99%
See 1 more Smart Citation
“…The relational gated graph network (RGGN) aims to learn the semantic representation from the dependency graph in section 2.1. The gated graph neural network (GGNN) is adopted due to its long-term propagation of information flow [23]. Additionally, different weight matrices corresponding to different types of edges are introduced to GGNN inspired by relational graph convolutional network [26].…”
Section: Relational Gated Graph Networkmentioning
confidence: 99%
“…However, Graph-Speech uses a recurrent neural network (RNN) based relation encoder to model the dependency relation, which is insufficient to capture information considering the complex sentence structure. Graph neural network (GNN) [21,22] is further introduced to TTS for its ability to learn the representation via message passing among the nodes of graphs, like GraphTTS [23] and GraphPB [24], but all of them use simple structures designed only from the text and without considering deeper semantics.…”
Section: Introductionmentioning
confidence: 99%
“…Sequence-to-Sequence learning is a part of ML and a method of neural networks that is mostly utilized in language processing models [17][18][19][20][21][22][23]. It can be implemented with recurrent neural networks (RNNs) using encoder-decoder based machine interpretation that maps an input sequence to a yield of a succession of output sequence with a tag and consideration esteem.…”
Section: Introductionmentioning
confidence: 99%
“…Another benefit of SAN is to function with intra-attention [14,16] , which has a shorter path to model long distance context. Despite the progress [15], Transformer TTS doesn't explicitly associate input text with output utterances from syntactic point of view at sentence level, which is proven useful in speaking style and prosody modeling [17][18][19][20][21]. As a result, the rendering of utterance is adversely affected especially for long sentences.…”
Section: Introductionmentioning
confidence: 99%