2021
DOI: 10.48550/arxiv.2104.06835
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Enhancing Word-Level Semantic Representation via Dependency Structure for Expressive Text-to-Speech Synthesis

Abstract: Semantic information of a sentence is crucial for improving the expressiveness of a text-to-speech (TTS) system, but can not be well learned from the limited training TTS dataset just by virtue of the nowadays encoder structures. As large scale pretrained text representation develops, bidirectional encoder representations from transformers (BERT) has been proven to embody text-context semantic information and applied to TTS as additional input. However BERT can not explicitly associate semantic tokens from poi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 21 publications
0
1
0
Order By: Relevance
“…One way to alleviate the oneto-many mapping problem and combat over-smoothing prediction is to use advanced generative models to implicitly learn the variation information, which can better model the multi-modal distribution. • Text pre-training [78,101,387,140,95,447], which can provide better text representations by using pre-trained word embeddings or model parameters.…”
Section: Perspective Category Description Workmentioning
confidence: 99%
“…One way to alleviate the oneto-many mapping problem and combat over-smoothing prediction is to use advanced generative models to implicitly learn the variation information, which can better model the multi-modal distribution. • Text pre-training [78,101,387,140,95,447], which can provide better text representations by using pre-trained word embeddings or model parameters.…”
Section: Perspective Category Description Workmentioning
confidence: 99%