Abstract. In this paper we propose the two-stage approach of organizing information in video surveillance systems. At first, the faces are detected in each frame and a video stream is split into sequences of frames with face region of one person. Secondly, these sequences (tracks) that contain identical faces are grouped using face verification algorithms and hierarchical agglomerative clustering. Gender and age are estimated for each cluster (person) in order to facilitate the usage of the organized video collection. The particular attention is focused on the aggregation of features extracted from each frame with the deep convolutional neural networks. The experimental results of the proposed approach using YTF and IJB-A datasets demonstrated that the most accurate and fast solution is achieved for matching of normalized average of feature vectors of all frames in a track.
The article is considering the problem of increasing the performance and accuracy of video face identification. We examine the selection of the several best video frames using various techniques for assessing the quality of images. In contrast to traditional methods with estimation of image brightness/contrast, we propose to utilize the deep learning techniques that estimate the frame quality by using the lightweight convolutional neural network. In order to increase the effectiveness of the frame quality assessment step, we propose to distill knowledge of the cumbersome existing FaceQNet model for which there is no publicly available training dataset. The selected K-best frames are used to describe an input set of frames with a single average descriptor suitable for the nearest neighbor classifier. The proposed algorithm is compared with the traditional face feature extraction for each frame, as well as with the known clustering methods for a set of video frames.
In this paper we develop the age and gender recognition mobile system using deep convolutional neural networks for mobile applications. The brief literature survey on the age/gender problem in retail applications is presented. The comparative analysis of classifier fusion algorithms to aggregate decisions for individual frames is provided. In order to improve the age and gender identification accuracy we implement the video-based recognition system with several aggregation methods. We provide the experimental comparison for IJB-A, Indian Movies, Kinect and EmotiW2018 datasets. It is demonstrated that the most accurate decisions are obtained using the geometric mean and mathematical expectation of the outputs at softmax layers of the convolutional neural networks for gender recognition and age prediction, respectively. As a result, the off-line application of the proposed system is implemented on the Android platform.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.