K Radhika scite author profile

Stress is one of the most severe concerns in modern life. High-level stress can create various diseases or loss of focus and productivity at work. Being under stress prevents people from recognizing their stress levels, so early stress detection is essential. Recently, multimodal fusion has enhanced the performance of stress detection models using Deep Learning (DL) techniques. The low, mid, and high-level features of a Convolutional Neural Network (CNN) are discriminative. A comprehensive feature representation can be obtained by fusing all three levels of CNN's features. This study mainly focuses on detecting stress by exploiting these advantages using a multimodal hierarchical CNN feature fusion. The two multimodal physiological signals used in this study are Electrodermal activity (EDA) and Electrocardiogram (ECG). We develop a hierarchical feature set by concatenating multi-level CNN features for each modality. Multimodal fusion on both hierarchical feature sets is performed using the Multimodal Transfer Module (MMTM). The experiments are carried out with raw frequency domain data and the features from the frequency bands to study the effectiveness of both. The model's performance is compared to the different combinations of hierarchical features from low, mid, and high levels. To verify the generalizability, the proposed approach has been evaluated on four benchmark datasets -ASCERTAIN, CLAS, MAUS, and WAUC. The proposed method showed its effectiveness by outperforming existing models by 1-2%, respectively, on frequency band features. It is observed that the hierarchical feature set from all three levels performed better than all other combinations by 2-4%. As a result, this strategy can be a useful addition to stress detection.

show abstract

Lip to Speech and Text Synthesis

Vinola¹,

Radhika²

2022

IRJCS

View full text Add to dashboard Cite

Humans involuntarily tend to infer parts of the conversation from lip movements when the speech is absent or corrupted by external noise.In this work, we explore the task of lip to speech synthesis, i.e., learning to generate natural speech given only the lip movements of a speaker. Acknowledging the importance of contextual and speaker specific cues for accurate lip-reading, we take a different path from existing works.We focus on learning accurate lip sequences to speech mappings for individual speakers in unconstrained, large vocabulary settings. We collect and release a large-scale benchmark dataset, the first of its kind, specifically to train and evaluate the single speaker lip to speech task in natural settings.We propose a novel approach with key design choices to achieve accurate, natural lip to speech synthesis in such unconstrained scenarios for the first time. Extensive evaluation using quantitative, qualitative metrics and human evaluation shows that our method is four times more intelligible than previous works in this space.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

K Radhika

Joint Modality Features in Frequency Domain for Stress Detection

Transfer Learning for Subject-Independent Stress Detection using Physiological Signals

Deep Multimodal Fusion for Subject-Independent Stress Detection

Multimodal Hierarchical CNN Feature Fusion for Stress Detection

Lip to Speech and Text Synthesis

Contact Info

Product

Resources

About