HyperExtended LightFace: A Facial Attribute Analysis Framework

Serengil, Sefik Ilkin; Özpınar, Alper

doi:10.1109/iceet53442.2021.9659697

Cited by 140 publications

(39 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Also, based on previously obtained landmarks, a search for ROI is performed, including: the face regions for each frame, lips, and hands. ROIs are shown in Figure 7 c. The face region is fed to pre-trained models ( accessed on 6 February 2023) from the Deepface open source software platform [ 135 , 136 ] for machine classification of the signer’s gender and age. Previously, we used these models in a similar problem of gesture recognition [ 137 ].…”

Section: Methodsmentioning

confidence: 99%

Audio-Visual Speech and Gesture Recognition by Sensors of Mobile Devices

Ryumin

Ivanko

Ryumina

2023

Sensors

View full text Add to dashboard Cite

Audio-visual speech recognition (AVSR) is one of the most promising solutions for reliable speech recognition, particularly when audio is corrupted by noise. Additional visual information can be used for both automatic lip-reading and gesture recognition. Hand gestures are a form of non-verbal communication and can be used as a very important part of modern human–computer interaction systems. Currently, audio and video modalities are easily accessible by sensors of mobile devices. However, there is no out-of-the-box solution for automatic audio-visual speech and gesture recognition. This study introduces two deep neural network-based model architectures: one for AVSR and one for gesture recognition. The main novelty regarding audio-visual speech recognition lies in fine-tuning strategies for both visual and acoustic features and in the proposed end-to-end model, which considers three modality fusion approaches: prediction-level, feature-level, and model-level. The main novelty in gesture recognition lies in a unique set of spatio-temporal features, including those that consider lip articulation information. As there are no available datasets for the combined task, we evaluated our methods on two different large-scale corpora—LRW and AUTSL—and outperformed existing methods on both audio-visual speech recognition and gesture recognition tasks. We achieved AVSR accuracy for the LRW dataset equal to 98.76% and gesture recognition rate for the AUTSL dataset equal to 98.56%. The results obtained demonstrate not only the high performance of the proposed methodology, but also the fundamental possibility of recognizing audio-visual speech and gestures by sensors of mobile devices.

show abstract

Section: Methodsmentioning

confidence: 99%

Audio-Visual Speech and Gesture Recognition by Sensors of Mobile Devices

Ryumin

Ivanko

Ryumina

2023

Sensors

View full text Add to dashboard Cite

show abstract

“…It is a challenging problem due to internal factors, such as gender and race, and external factors, such as lifestyle or environment. Nowadays, this problem is addressed mostly using Deep Learning-based models that extract features automatically and do not depend on handcrafted features [8,9,10,11,12,13,14,15,16,17].…”

Section: Age Estimation Modelsmentioning

confidence: 99%

“…SSR-Net yielded an MAE of 3.16 on the MORPH2 dataset. Similarly, Serengil et al [15] created a lightweight framework called Deepface for face recognition and facial attribute analysis, including age estimation. The age estimator is based on the VGG16 architecture and achieved an MAE of 4.65 on the IMDB-WIKI dataset.…”

Section: Age Estimation Modelsmentioning

confidence: 99%

See 1 more Smart Citation

Assessment of age estimation methods for forensic applications using non-occluded and synthetic occluded facial images

Jeuland¹,

Ferreras²,

Chaves³

et al. 2022

XLIII Jornadas De Automática: Libro De Actas: 7, 8 Y 9 De Septiembre De 2022, Logroño (La Rioja)

View full text Add to dashboard Cite

Age estimation is a valuable forensic tool for criminal investigators since it helps to identify minors or possible offenders in Child Sexual Exploitation Materials (CSEM). Nowadays, Deep Learning methods are considered state-of-the-art for general age estimation. However, they have low performance in predicting the age of minors and older adults because of the few examples of these age groups in the existing datasets. Moreover, facial occlusion is used by offenders in certain CSEM, trying to hide the identity of the victims, which may also affect the performance of age estimators. In this work, we assess the performance of six deep-learning-based age estimators on non-occluded and occluded facial images. We selected FG-Net and APPA-REAL datasets to evaluate the models under non-occluded conditions. To assess the models under occluded conditions, we created synthetically occluded versions of the non-occluded datasets by drawing eye and mouth black masks to simulate the conditions observed in some CSEM images. Experimental results showed that the evaluated age estimators are affected more by eye occlusion than by mouth occlusion. Also, facial occlusion affects more the accuracy of the age estimation of minors and the elderly compared to other age groups. We expect that this study could become an initial benchmark for age estimation under non-occluded and occluded conditions, especially for forensic applications like victim profiling on CSEM where age estimation is essential.

show abstract

“…For improved matching of the DBpedia thumbnails with the later recognized faces from the videos it is important that the input to the face recognition model is always similar in terms of colors and pose. Since the DeepFace library [6] only performs rotations and no affine transformations, we implemented an own alignment function. Using Arcface [4] we created a 512 dimensional vector representing the face of a person.…”

Section: Multi-modal Entity Linkingmentioning

confidence: 99%

Semantic Video Entity Linking

Grams

Li²,

Tong³

et al. 2022

The Semantic Web: ESWC 2022 Satellite Events

View full text Add to dashboard Cite

Knowledge graphs are an established technology in the field of information retrieval and question answering. However, the focus is mostly on searching web pages and related documents and less on video formats, resulting in the fact that queries on videos for refining the search are often neglected. In this demo, we show a framework for recognizing faces in YouTube videos and linking them to the matching entities in DBpedia using the thumbnails available in DBpedia. By linking the videos from YouTube with the information from DBpedia, more complex search queries can be made possible. We will present both the frontend of the application, including the search, adding more YouTube videos and formulating complex queries, as well as the architecture and the libraries used in the application.

show abstract

HyperExtended LightFace: A Facial Attribute Analysis Framework

Cited by 140 publications

References 12 publications

Audio-Visual Speech and Gesture Recognition by Sensors of Mobile Devices

Audio-Visual Speech and Gesture Recognition by Sensors of Mobile Devices

Assessment of age estimation methods for forensic applications using non-occluded and synthetic occluded facial images

Semantic Video Entity Linking

Contact Info

Product

Resources

About