Bimodal feature-based fusion for real-time emotion recognition in a mobile context

Gievska, Sonja; Koroveshovski, Kiril; Tagasovska, Natasha

doi:10.1109/acii.2015.7344602

Cited by 8 publications

(6 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For the lexical modality, sparse features drawn from hand crafted affective dictionaries are dominant in current studies, e.g., Linguistic Inquiry and Word Count (LIWC) [14] based lexical features [15] and WordNetAffect [16] based lexical features [17]. However, current Paralinguistic studies on human-human dialogue suggest that besides lexical content, other phenomena in speech are also indicators of emotion.…”

Section: Featuresmentioning

confidence: 99%

“…In FL fusion (e.g., [15]), feature sets from different modalities are concatenated before performing recognition, as shown in Figure 1. In some studies, feature engineering is first applied to the concatenated feature set or individual feature sets (e.g., [17]). However, it is hard to apply knowledge about different modalities in FL fusion.…”

Section: Multimodal Emotion Recognitionmentioning

confidence: 99%

See 1 more Smart Citation

Recognizing emotions in spoken dialogue with hierarchically fused acoustic and lexical features

Tian

Moore

Lai

2016

2016 IEEE Spoken Language Technology Workshop (SLT)

View full text Add to dashboard Cite

Automatic emotion recognition is vital for building natural and engaging human-computer interaction systems. Combining information from multiple modalities typically improves emotion recognition performance. In previous work, features from different modalities have generally been fused at the same level with two types of fusion strategies: Feature-Level fusion, which concatenates feature sets before recognition; and Decision-Level fusion, which makes the final decision based on outputs of the unimodal models. However, different features may describe data at different time scales or have different levels of abstraction. Cognitive Science research also indicates that when perceiving emotions, humans use information from different modalities at different cognitive levels and time steps. Therefore, we propose a Hierarchical fusion strategy for multimodal emotion recognition, which incorporates global or more abstract features at higher levels of its knowledge-inspired structure.We build multimodal emotion recognition models combining state-of-the-art acoustic and lexical features to study the performance of the proposed Hierarchical fusion. Experiments on two emotion databases of spoken dialogue show that this fusion strategy consistently outperforms both Feature-Level and Decision-Level fusion. The multimodal emotion recognition models using the Hierarchical fusion strategy achieved state-of-the-art performance on recognizing emotions in both spontaneous and acted dialogue.

show abstract

Section: Featuresmentioning

confidence: 99%

Section: Multimodal Emotion Recognitionmentioning

confidence: 99%

Recognizing emotions in spoken dialogue with hierarchically fused acoustic and lexical features

Tian

Moore

Lai

2016

2016 IEEE Spoken Language Technology Workshop (SLT)

View full text Add to dashboard Cite

show abstract

“…The authors also reported an accuracy of 80.36% for audio-visual emotion recognition while employing a hybrid deep model architecture [23]. Another use of the corpus was made in real-time bi-modal emotion detection in a mobile context, where seven emotions were classified, and the results were reported in terms of precision (90.8), recall (90.7), and F1-measure (90.7) for a feature level fusion [156].…”

Section: Rmlmentioning

confidence: 99%

“…SAVEE DB was used for exploring the sources of temporal variation in human audio-visual behavioral data by introducing temporal segmentation and timeseries analysis techniques [19]. In a bi-modal fusion of linguistic and acoustic cues in speech, SAVEE was used for affect recognition at the language level using both ML and valence assessment of the words for the classification of 7 emotions [156]. In an affective human-robot interaction, the real-time fusion of facial expressions and speech from SAVEE using 3 DBNs (two for classifying and the third for fusing the o/p of the first two) resulted in an accuracy of 96.2%.…”

Section: Saveementioning

confidence: 99%

A Survey on Databases for Multimodal Emotion Recognition and an Introduction to the VIRI (Visible and InfraRed Image) Database

Siddiqui

Dhakal

Yang

et al. 2022

MTI

View full text Add to dashboard Cite

Multimodal human–computer interaction (HCI) systems pledge a more human–human-like interaction between machines and humans. Their prowess in emanating an unambiguous information exchange between the two makes these systems more reliable, efficient, less error prone, and capable of solving complex tasks. Emotion recognition is a realm of HCI that follows multimodality to achieve accurate and natural results. The prodigious use of affective identification in e-learning, marketing, security, health sciences, etc., has increased demand for high-precision emotion recognition systems. Machine learning (ML) is getting its feet wet to ameliorate the process by tweaking the architectures or wielding high-quality databases (DB). This paper presents a survey of such DBs that are being used to develop multimodal emotion recognition (MER) systems. The survey illustrates the DBs that contain multi-channel data, such as facial expressions, speech, physiological signals, body movements, gestures, and lexical features. Few unimodal DBs are also discussed that work in conjunction with other DBs for affect recognition. Further, VIRI, a new DB of visible and infrared (IR) images of subjects expressing five emotions in an uncontrolled, real-world environment, is presented. A rationale for the superiority of the presented corpus over the existing ones is instituted.

show abstract

“…Therefore, there are several types of data sources such as speech, text, facial expression, body movement, and physiological measurement like EEG, finger temperature, skin conductance level, heart rate, and muscle activity [11,13,14,69]–[73]. There is a mass of emotion data for speech, text, and facial expression as these are details that can be easily gathered from devices used by people on a daily basis, such as cellphones and computers [75,76]. On the other hand, physiological signals can also be a good source of information since they could be collected continuously without participants interfering [77,78].…”

Section: Eegmentioning

confidence: 99%

Deep Belief Networks for Electroencephalography: A Review of Recent Contributions and Future Outlooks

Movahedi

Coyle

Sejdić

2018

IEEE J. Biomed. Health Inform.

102

View full text Add to dashboard Cite

Deep learning, a relatively new branch of machine learning, has been investigated for use in a variety of biomedical applications. Deep learning algorithms have been used to analyze different physiological signals and gain a better understanding of human physiology for automated diagnosis of abnormal conditions. In this paper, we provide an overview of deep learning approaches with a focus on deep belief networks in electroencephalography applications. We investigate the state-of-the-art algorithms for deep belief networks and then cover the application of these algorithms and their performances in electroencephalographic applications. We covered various applications of electroencephalography in medicine, including emotion recognition, sleep stage classification, and seizure detection, in order to understand how deep learning algorithms could be modified to better suit the tasks desired. This review is intended to provide researchers with a broad overview of the currently existing deep belief network methodology for electroencephalography signals, as well as to highlight potential challenges for future research.

show abstract

Bimodal feature-based fusion for real-time emotion recognition in a mobile context

Cited by 8 publications

References 28 publications

Recognizing emotions in spoken dialogue with hierarchically fused acoustic and lexical features

Recognizing emotions in spoken dialogue with hierarchically fused acoustic and lexical features

A Survey on Databases for Multimodal Emotion Recognition and an Introduction to the VIRI (Visible and InfraRed Image) Database

Deep Belief Networks for Electroencephalography: A Review of Recent Contributions and Future Outlooks

Contact Info

Product

Resources

About