Interspeech 2022 2022
DOI: 10.21437/interspeech.2022-10061
|View full text |Cite
|
Sign up to set email alerts
|

Enhancing Word-Level Semantic Representation via Dependency Structure for Expressive Text-to-Speech Synthesis

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
6
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(6 citation statements)
references
References 0 publications
0
6
0
Order By: Relevance
“…In addition, part of synthetic speech samples 5 are disclosed in SAG-Tacotron, which makes the comparison more credible. • BERT-Dep: This model is based on dependent structure and pre-trained BERT to generate word-level semantic representation information, which is fused into the latent representations of Tacotron2 as additional features (Zhou et al 2021). It is the latest based on semantic information to improve the naturalness and expressiveness of speech synthesis 6 .…”
Section: Baseline Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…In addition, part of synthetic speech samples 5 are disclosed in SAG-Tacotron, which makes the comparison more credible. • BERT-Dep: This model is based on dependent structure and pre-trained BERT to generate word-level semantic representation information, which is fused into the latent representations of Tacotron2 as additional features (Zhou et al 2021). It is the latest based on semantic information to improve the naturalness and expressiveness of speech synthesis 6 .…”
Section: Baseline Methodsmentioning
confidence: 99%
“…Therefore, many works improve the expressiveness of TTS by introducing syntactic information Liu, Sisman, and Li 2021), those methods by explicitly associating input phoneme embedding with syntactic relations. A word-level semantic representation method is proposed in (Zhou et al 2021) which is based on dependent structure and pre-trained BERT. Although the above works explore the validity of linguistic knowledge, the local modeling performance has not improved significantly on the TTS task.…”
Section: Introductionmentioning
confidence: 99%
“…Kim, Kong, and Son (2021) Tatanov, Beliaev, and Ginsburg (2022) boosted the expressiveness of speech by applying various methods proposed in the field of natural language processing (NLP) to the speech domain. Especially, GraphSpeech (Liu, Sisman, and Li 2021) and Relational Gated Graph Network (RGGN) (Zhou et al 2022) claimed the syntactic and semantic information of text affects the naturalness and expressiveness of speech. They improved the performance by utilizing graph networks focused on the representation based on dependency relations.…”
Section: Introductionmentioning
confidence: 99%
“…We need the proper way to align the dependency relations with H S because we use them simultaneously. Therefore, we utilize Word-level Average Pooling to align between them, similar to Subword-to-Word Mapping by Zhou et al (2022). We use average pooling (AP ) based on word level represented as,…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation