With regard to human–machine interaction, accurate emotion recognition is a challenging problem. In this paper, efforts were taken to explore the possibility to complete the feature abstraction and fusion by the homogeneous network component, and propose a dual-modal emotion recognition framework that is composed of a parallel convolution (Pconv) module and attention-based bidirectional long short-term memory (BLSTM) module. The Pconv module employs parallel methods to extract multidimensional social features and provides more effective representation capacity. Attention-based BLSTM module is utilized to strengthen key information extraction and maintain the relevance between information. Experiments conducted on the CH-SIMS dataset indicate that the recognition accuracy reaches 74.70% on audio data and 77.13% on text, while the accuracy of the dual-modal fusion model reaches 90.02%. Through experiments it proves the feasibility to process heterogeneous information within homogeneous network component, and demonstrates that attention-based BLSTM module would achieve best coordination with the feature fusion realized by Pconv module. This can give great flexibility for the modality expansion and architecture design.
<abstract> <p>Under the background that Covid-19 is spreading across the world, the lifestyle of people has to confront a series of changes and challenges. This also presents new problems and requirements to automation facilities. For example, nowadays masks have almost become necessities for people in public places. However, most access control systems (ACS) cannot recognize people wearing masks and authenticate their identities to deal with increasingly serious epidemic pressure. Consequently, many public entries have turned to an attendant mode that brings low efficiency, infection potential, and high possibility of negligence. In this paper, a new security classification framework based on face recognition is proposed. This framework uses mask detection algorithm and face authentication algorithm with anti-spoofing function. In order to evaluate the performance of the framework, this paper employs the Chinese Academy of Science Institute of Automation-Face Anti-spoofing Datasets (CASIA-FASD) and Reply-Attack datasets as benchmarks. Performance evaluation indicates that the Half Total Error Rate (HTER) is 9.7%, the Equal Error Rate (EER) is 5.5%. The average process time of a single frame is 0.12 seconds. The results demonstrate that this framework has a high anti-spoofing capability and can be employed on the embedded system to complete the mask detection and face authentication task in real-time.</p> </abstract>
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.