iEmoTTS: Toward Robust Cross-Speaker Emotion Transfer and Control for Speech Synthesis Based on Disentanglement Between Prosody and Timbre

Zhang, Guangyan; Qin, Ying; Zhang, Wenjie; Wu, Jialun; Li, Mei; Yutao, Gai,; Jiang, Feijun; Lee, Tan

doi:10.1109/taslp.2023.3268571

Cited by 10 publications

(1 citation statement)

References 46 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Despite numerous studies in the fields of timbre separation, synthesis, and restoration, existing methods are often limited by the resolution constraints of time-frequency analysis, making it difficult to handle complex audio signals, especially the separation of mixed sounds from traditional ethnic musical instruments [13,14]. Furthermore, existing synthesis methods often lack sufficient flexibility and expressiveness when dealing with the subtle differences in the timbres of ethnic instruments, while timbre restoration techniques still have shortcomings in continuity processing and naturalness restoration [15][16][17].…”

Section: Introductionmentioning

confidence: 99%

Synthesis and Restoration of Traditional Ethnic Musical Instrument Timbres Based on Time-Frequency Analysis

Chen,

Xiang,

Xiong

2024

View full text Add to dashboard Cite

With the advent of the digital age, the preservation and restoration of the timbres of traditional ethnic musical instruments have emerged as significant areas of study in musicology and signal processing. Music serves not only as a bridge between history and culture but also plays an irreplaceable role in expressing ethnic characteristics and emotions. The timbres of traditional ethnic musical instruments, owing to their unique musical expressiveness and cultural value, have attracted widespread attention from both the academic and industrial sectors. However, many valuable timbre recordings are facing threats of damage and disappearance due to limitations in old recording technologies and preservation conditions. Moreover, existing timbre processing technologies still require improvements in separation accuracy, synthesis authenticity, and restoration naturalness. This study aims to achieve efficient separation, authentic synthesis, and natural restoration of the sounds of traditional ethnic musical instruments through advanced signal processing methods. Initially, this paper discusses a sound separation technique for traditional ethnic musical instruments based on time-frequency analysis, addressing the issue of insufficient resolution in complex audio signals. Subsequently, it proposes a timbre synthesis method based on the Transformer deep learning model, which can understand and reproduce the delicate timbral characteristics of musical instruments. Finally, addressing the continuity issue in timbre restoration, this paper introduces an innovative restoration technique to enhance the quality of damaged audio restoration and auditory consistency. Through the application of these methods, this study not only contributes to the protection and restoration of traditional timbres but also advances related audio processing technologies.

show abstract