Nepali Text to Speech Synthesis System using ESNOLA Method of Concatenation

Chettri, Bhusan; Shah, Krishna Bikram

doi:10.5120/10053-4909

“…According to [15], "The vocalized form of human communication is termed as audio, each of our spoken word is created out of phonetic combination of a limited set of vowel and consonant audio, which are the sound units in audio synthesis" Even speaking the exact same word(s), with different speed, loudness, pitch, and accent, including both cultural and age-related differences, may lead to different results.…”

Section: Audio Based Feature Extractionmentioning

confidence: 99%

Protected multimodal emotion recognition

Tang¹

2021

Preprint

0

View full text Add to dashboard Cite

In this thesis, we propose Protected Multimodal Emotion recognition (PMM-ER), an emotion recognition approach that includes security features against the growing rate of cyber-attacks on various databases, including emotion databases. The analysis on the frequently used encryption algorithms has led to the modified encryption algorithm proposed in this work. The system is able to recognize 7 different emotions, i.e. happiness, sadness, surprise, fear, disgust and anger, as well as a neutral emotion state, based on 2D video frames, 3D vertices, and audio wave information. Several well-known features are employed, including the HSV colour feature, iterative closest point (ICP) and Mel-frequency cepstral coefficients (MFCCs). We also propose a novel approach to feature fusion including both decision- and feature-level fusion, and some well-known classification and feature extraction algorithms such as principle component analysis (PCA), linear discernment analysis (LDA) and canonical correlation analysis (CCA) are compared in this study.

show abstract

“…According to [15], "The vocalized form of human communication is termed as audio, each of our spoken word is created out of phonetic combination of a limited set of vowel and consonant audio, which are the sound units in audio synthesis" Even speaking the exact same word(s), with different speed, loudness, pitch, and accent, including both cultural and age-related differences, may lead to different results.…”

Section: Audio Based Feature Extractionmentioning

confidence: 99%

Protected multimodal emotion recognition

Tang¹

2021

Preprint

0

View full text Add to dashboard Cite

In this thesis, we propose Protected Multimodal Emotion recognition (PMM-ER), an emotion recognition approach that includes security features against the growing rate of cyber-attacks on various databases, including emotion databases. The analysis on the frequently used encryption algorithms has led to the modified encryption algorithm proposed in this work. The system is able to recognize 7 different emotions, i.e. happiness, sadness, surprise, fear, disgust and anger, as well as a neutral emotion state, based on 2D video frames, 3D vertices, and audio wave information. Several well-known features are employed, including the HSV colour feature, iterative closest point (ICP) and Mel-frequency cepstral coefficients (MFCCs). We also propose a novel approach to feature fusion including both decision- and feature-level fusion, and some well-known classification and feature extraction algorithms such as principle component analysis (PCA), linear discernment analysis (LDA) and canonical correlation analysis (CCA) are compared in this study.

show abstract