Gender recognition has been among the most investigated problems in the last years; although several contributions have been proposed, gender recognition in unconstrained environments is still a challenging problem and a definitive solution has not been found yet. Furthermore, Deep Convolutional Neural Networks (DCNNs) achieve very interesting performance, but they typically require a huge amount of computational resources (CPU, GPU, RAM, storage), that are not always available in real systems, due to their cost or to specific application constraints (when the application needs to be installed directly on board of low-power smart cameras, e.g. for digital signage). In the latest years the Machine Learning community developed an interest towards optimizing the efficiency of Deep Learning solutions, in order to make them portable and widespread. In this work we propose a compact DCNN architecture for Gender Recognition from face images that achieves approximately state of the art accuracy at a highly reduced computational cost (almost five times). We also perform a sensitivity analysis in order to show how some changes in the architecture of the network can influence the tradeoff between accuracy and speed. In addition, we compare our optimized architecture with popular efficient CNNs on various common benchmark dataset, widely adopted in the scientific community, namely LFW, MIVIA-Gender, IMDB-WIKI and Adience, demonstrating the effectiveness of the proposed solution.
Although in recent years we have witnessed an explosion of the scientific research in the recognition of facial soft biometrics such as gender, age and expression with deep neural networks, the recognition of ethnicity has not received the same attention from the scientific community. The growth of this field is hindered by two related factors: on the one hand, the absence of a dataset sufficiently large and representative does not allow an effective training of convolutional neural networks for the recognition of ethnicity; on the other hand, the collection of new ethnicity datasets is far from simple and must be carried out manually by humans trained to recognize the basic ethnicity groups using the somatic facial features. To fill this gap in the facial soft biometrics analysis, we propose the VGGFace2 Mivia Ethnicity Recognition (VMER) dataset, composed by more than 3,000,000 face images annotated with 4 ethnicity categories, namely African American, East Asian, Caucasian Latin and Asian Indian. The final annotations are obtained with a protocol which requires the opinion of three people belonging to different ethnicities, in order to avoid the bias introduced by the well-known other race effect. In addition, we carry out a comprehensive performance analysis of popular deep network architectures, namely VGG-16, VGG-Face, ResNet-50 and MobileNet v2. Finally, we perform a cross-dataset evaluation to demonstrate that the deep network architectures trained with VMER generalize on different test sets better than the same models trained on the largest ethnicity dataset available so far. The ethnicity labels of the VMER dataset and the code used for the experiments are available upon request at https://mivia.unisa.it.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.