2021
DOI: 10.1109/taslp.2020.3040523
|View full text |Cite
|
Sign up to set email alerts
|

Exploiting Morphological and Phonological Features to Improve Prosodic Phrasing for Mongolian Speech Synthesis

Abstract: Prosodic phrasing is an important factor that affects naturalness and intelligibility in text-to-speech synthesis. Studies show that deep learning techniques improve prosodic phrasing when large text and speech corpus are available. However, for low-resource languages, such as Mongolian, prosodic phrasing remains a challenge for various reasons. First, the database suitable for system training is limited; Second, word composition knowledge that is prosody-informing has not been used in prosodic phrase modeling… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
10
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3
3

Relationship

3
7

Authors

Journals

citations
Cited by 28 publications
(10 citation statements)
references
References 62 publications
0
10
0
Order By: Relevance
“…Speech conveys information not only through phonetic content, but also through its prosody. Speech prosody can affect syntactic and semantic interpretation of an utterance [22], [23], that is called linguistic prosody. Speech prosody is also used to display one's emotional state, that is referred to as affective prosody.…”
Section: Introductionmentioning
confidence: 99%
“…Speech conveys information not only through phonetic content, but also through its prosody. Speech prosody can affect syntactic and semantic interpretation of an utterance [22], [23], that is called linguistic prosody. Speech prosody is also used to display one's emotional state, that is referred to as affective prosody.…”
Section: Introductionmentioning
confidence: 99%
“…Another benefit of SAN is to function with intra-attention [14,16] , which has a shorter path to model long distance context. Despite the progress [15], Transformer TTS doesn't explicitly associate input text with output utterances from syntactic point of view at sentence level, which is proven useful in speaking style and prosody modeling [17][18][19][20][21]. As a result, the rendering of utterance is adversely affected especially for long sentences.…”
Section: Introductionmentioning
confidence: 99%
“…Electronic synthetic tones bring rich new sound experience to music of various styles and themes. Electronic musical instruments differ from traditional acoustic instruments in sound rendering principle and acoustic features [14][15][16][17][18][19]. Miranda et al [20] expounded the computeraided means to realize the acoustic features, voice editing, and modulation of electronic sound melodies and provided a valuable reference for applying electronic sound melodies in modern music creation.…”
Section: Introductionmentioning
confidence: 99%