With the wide spread usage of smartphones and social media platforms, video logging is gaining an increasing popularity, especially after the advent of YouTube in 2005 with hundred millions of views per day. It has attracted interest of many people with immense emerging applications, e.g. filmmakers, journalists, product advertisers, entrepreneurs, educators and many others. Nowadays, people express and share their opinions online on various daily issues using different forms of content including texts, audios, images and videos. This study presents a multimodal approach for recognizing the speaker's age group from social media videos. Several structures of Artificial Neural Networks (ANNs) are presented and evaluated using standalone modalities. Moreover, a two-stage ensemble network is proposed to combine multiple modalities. In addition, a corpus of videos has been collected and prepared for multimodal age-group recognition with focus on Arabic language speakers. The experimental results demonstrated that combining different modalities can mitigate the limitations of unimodal recognition systems and lead to significant improvements in the results.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.