Organizing Multimedia Data in Video Surveillance Systems Based on Face Verification with Convolutional Neural Networks

Sokolova, Anastasiia D.; Kharchevnikova, Angelina S.; Savchenko, Andrey V.

doi:10.1007/978-3-319-73013-4_20

Cited by 23 publications

(9 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Only each of, for example, three or five frames, is selected in each video clip, extract identity features of all detected faces and initially cluster only the faces found in this clip. After that the normalized average of identity features of all clusters (Sokolova, Kharchevnikova & Savchenko, 2017) are computed. They are added to the dataset {X r } so that the "Facial clustering" module handles both features of all photos and average feature vectors of subjects found in all videos.…”

Section: Proposed Pipeline For Organizing Photo and Video Albumsmentioning

confidence: 99%

“…Nowadays, due to the extreme increase in multimedia resources, there is an urgent need to develop intelligent methods to process and organize them (Manju & Valarmathie, 2015). For example, the task of automatic structuring of photo and video albums is attracting increasing attention (Sokolova, Kharchevnikova & Savchenko, 2017;Zhang & Lu, 2002). The various photo organizing systems allow users to group and tag photos and videos in order to retrieve large number of images in the media library (He et al, 2017).…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Efficient facial representations for age, gender and identity recognition in organizing photo albums using multi-output ConvNet

Savchenko

2019

PeerJ Computer Science

Self Cite

View full text Add to dashboard Cite

This paper is focused on the automatic extraction of persons and their attributes (gender, year of born) from album of photos and videos. A two-stage approach is proposed in which, firstly, the convolutional neural network simultaneously predicts age/gender from all photos and additionally extracts facial representations suitable for face identification. Here the MobileNet is modified and is preliminarily trained to perform face recognition in order to additionally recognize age and gender. The age is estimated as the expected value of top predictions in the neural network. In the second stage of the proposed approach, extracted faces are grouped using hierarchical agglomerative clustering techniques. The birth year and gender of a person in each cluster are estimated using aggregation of predictions for individual photos. The proposed approach is implemented in an Android mobile application. It is experimentally demonstrated that the quality of facial clustering for the developed network is competitive with the state-of-the-art results achieved by deep neural networks, though implementation of the proposed approach is much computationally cheaper. Moreover, this approach is characterized by more accurate age/gender recognition when compared to the publicly available models.

show abstract

Section: Proposed Pipeline For Organizing Photo and Video Albumsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Efficient facial representations for age, gender and identity recognition in organizing photo albums using multi-output ConvNet

Savchenko

2019

PeerJ Computer Science

Self Cite

View full text Add to dashboard Cite

show abstract

“…The same procedure is repeated for all video files. Only each of, e.g., 3 or 5 frames, is selected in each video clip, extract identity features of all detected faces and initially cluster only the faces found (Sokolova et al, 2017) are computed. They are added to the dataset {X r } so that the "Facial clustering" module handles both features of all photos and average feature vectors of subjects found in all videos.…”

Section: Proposed Pipeline For Organizing Photo and Video Albumsmentioning

confidence: 99%

“…Nowadays, due to the extreme increase in multimedia resources there is an urgent need to develop intelligent methods to process and organize them (Manju and Valarmathie, 2015). For example, the task of automatic structuring of photo and video albums is attracting increasing attention (Sokolova et al, 2017;Zhang and Lu, 2002). The various photo organizing systems allow users to group and tag photos and videos in order to retrieve large number of images in the media library (He et al, 2017).…”

Section: Introductionmentioning

confidence: 99%

Peer Review #2 of "Efficient facial representations for age, gender and identity recognition in organizing photo albums using multi-output ConvNet (v0.1)"

2019

View full text Add to dashboard Cite

This paper is focused on the automatic extraction of persons and their attributes (gender, year of born) from album of photos and videos. The two-stage approach is proposed, in which, firstly, the convolutional neural network simultaneously predicts age/gender from all photos and additionally extracts facial representations suitable for face identification. Here the MobileNet is modified and is preliminarily trained to perform face recognition, in order to additionally recognize age and gender. The age is estimated as the expected value of top predictions in the neural network. In the second stage of the proposed approach, extracted faces are grouped using hierarchical agglomerative clustering techniques. The born year and gender of a person in each cluster are estimated using aggregation of predictions for individual photos. The proposed approach is implemented in Android mobile application. It is experimentally demonstrated that the quality of facial clustering for the developed network is competitive with the state-of-the-art results achieved by deep neural networks, though implementation of the proposed approach is much computationally cheaper. Moreover, this approach is characterized by more accurate age/gender recognition when compared to the publicly available models.

show abstract

“…For example, they collect thousands of images (frames) every second [5,6,7]. Consequently, there is a challenge of ordering the visitors, whose faces were observed by a surveillance system [8].…”

Section: Introductionmentioning

confidence: 99%

Data organization in video surveillance systems using deep learning

Sokolova

Novgorod²,

Savchenko

2018

Collection of Selected Papers of the IV International Conference on Information Technology and Nanotechnology

View full text Add to dashboard Cite

In this paper we propose to organize information in video surveillance systems by grouping the video tracks, which contain identical faces. Aggregation of the features of individual frames extracted using deep convolutional neural networks are used in order to obtain a descriptor of video track. The tracks with identical faces are grouped using the known face verification algorithms and clustering methods. We experimentally compare frame aggregation methods using the YouTubeFaces dataset and contemporary neural networks (VGGFace, VGGFace2, LightenedCNN). It is shown that the most accurate video-based face verification is achieved with the L2-normalization of average unnormalized features of individual frames of each video track. Finally, we demonstrate that the best video grouping is obtained by sequential and rank-order clustering methods.

show abstract

Organizing Multimedia Data in Video Surveillance Systems Based on Face Verification with Convolutional Neural Networks

Cited by 23 publications

References 20 publications

Efficient facial representations for age, gender and identity recognition in organizing photo albums using multi-output ConvNet

Efficient facial representations for age, gender and identity recognition in organizing photo albums using multi-output ConvNet

Peer Review #2 of "Efficient facial representations for age, gender and identity recognition in organizing photo albums using multi-output ConvNet (v0.1)"

Data organization in video surveillance systems using deep learning

Contact Info

Product

Resources

About