A Benchmark and Comparative Study of Video-Based Face Recognition on COX Face Database

Huang, Zhiwu; Shan, Shiguang; Wang, Ruiping; Zhang, Haihong; Lao, Shihong; Kuerban, Alifu; Chen, Xilin

doi:10.1109/tip.2015.2493448

Cited by 136 publications

(83 citation statements)

References 42 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Fig 2(a) shows ROIs of 5 selected target individuals and their test video correspondence is recorded with 3 cameras above different portals. COX-S2V [37] consists of 1000 subjects, where each subject has a high quality still images under controlled conditions, and four lower-quality facial trajectories captured under uncontrolled conditions. Each trajectory has 25 faces, where ROIs taken from these videos encounter changes in illumination, expression, scale, viewpoint, and blur.…”

Section: Methodology For Validationmentioning

confidence: 99%

An Extended Sparse Classification Framework for Domain Adaptation in Video Surveillance

Nourbakhsh

Granger

Fumera

2017

Computer Vision – ACCV 2016 Workshops

View full text Add to dashboard Cite

Abstract. Still-to-video face recognition (FR) systems used in video surveillance applications capture facial trajectories across a network of distributed video cameras and compare them against stored distributed facial models. Currently, the performance of state-of-the-art systems is severely affected by changes in facial appearance caused by variations in, e.g., pose, illumination and scale in different camera viewpoints. Moreover, since an individual is typically enrolled using one or few reference stills captured during enrolment, face models are not robust to intra-class variation. In this paper, the Extended Sparse Representation Classification through Domain Adaptation (ESRC-DA) algorithm is proposed to improve performance of still-to-video FR. The system's facial models are thereby enhanced by integrating variational information from its operational domain. In particular, robustness to intra-class variations is improved by exploiting: (1) an under-sampled dictionary from target reference facial stills captured under controlled conditions; and (2) an auxiliary dictionary from an abundance of unlabelled facial trajectories captured under different conditions, from each camera viewpoint in the surveillance network. Accuracy and efficiency of the proposed technique is compared to state-of-the-art still-to-video FR techniques using videos from the Chokepoint and COX-S2V databases. Results indicate that ESRC-DA with dictionary learning of unlabelled trajectories provides the highest level of accuracy, while maintaining a low complexity. IntroductionWith the availability of low-cost video cameras and high capacity memory, technologies for video surveillance (VS) have become more prevalent in recent years. VS networks are increasingly deployed by public security organizations in e.g., airports, train stations and border crossings. Accurate and robust systems are required to recognize individuals and their actions from video feeds. In VS, decision support systems can rely on facial information (along with other sources, like soft biometrics) to alert an analyst as to the presence of individuals of interest. The ability to automatically recognize faces in videos recorded acrossThe final publication is available at Springer via http://dx

show abstract

Section: Methodology For Validationmentioning

confidence: 99%

An Extended Sparse Classification Framework for Domain Adaptation in Video Surveillance

Nourbakhsh

Granger

Fumera

2017

Computer Vision – ACCV 2016 Workshops

View full text Add to dashboard Cite

show abstract

“…In order to evaluate the performance of the proposed S+V model for still-to-video FR, an extensive series of experiments are conducted on Chokepoint 2 [24] and COX-S2V 3 [25] datasets. Chokepoint [24] and COX-S2V [25] datasets are suitable for ex-…”

Section: Datasetsmentioning

confidence: 99%

“…Moreover, since most state-of-the-art FR methods rely on Convolution Neural Network (CNN) architectures such as ResNet [20] and VGGNet [21], the model is fed with CNN features extracted from the atoms of dictionaries [22,23], in order to further improve still-tovideo FR accuracy. Performance of the SRC implementation is evaluated on two public video FR databases -Chokepoint [24] and COX-S2V [25].…”

Section: Introductionmentioning

confidence: 99%

A paired sparse representation model for robust face recognition from a single sample

Mokhayeri

Granger

2020

Pattern Recognition

View full text Add to dashboard Cite

Sparse representation-based classification (SRC) has been shown to achieve a high level of accuracy in face recognition (FR). However, matching faces captured in unconstrained video against a gallery with a single reference facial still per individual typically yields low accuracy. For improved robustness to intra-class variations, SRC techniques for FR have recently been extended to incorporate variational information from an external generic set into an auxiliary dictionary. Despite their success in handling linear variations, non-linear variations (e.g., pose and expressions) between probe and reference facial images cannot be accurately reconstructed with a linear combination of images in the gallery and auxiliary dictionaries because they do not share the same type of variations. In order to account for non-linear variations due to pose, a paired sparse representation model is introduced allowing for joint use of variational information and synthetic face images. The proposed model, called synthetic plus variational model, reconstructs a probe image by jointly using (1) a variational dictionary and (2) a gallery dictionary augmented with a set of synthetic images generated over a wide diversity of pose angles. The augmented gallery dictionary is then encouraged to pair the same sparsity pattern with the variational dictionary for similar pose angles by solving a newly formulated simultaneous sparsity-based optimization problem. Experimental results obtained on Chokepoint and COX-S2V datasets, using different face representations, indicate that the proposed approach can outperform state-of-the-art SRC-based methods for still-to-video FR with a single sample per person.

show abstract

“…There are 293 identities from PaSC testing dataset, including 9376 still images (about 32 images per subject), and 2802 videos (approximately ten videos per person and 100 frames per video). Another dataset is COX [25], which has 1,000 subjects, 3 videos captured for each subject with 3 different camcorders. An interactive tool for annotating facial attributes was developed, which displayed multiple face images from the same subject.…”

Section: A Datamentioning

confidence: 99%

“…The COX [25] consists of 1,000 subjects and three videos for each subject. We focus on the videos, which contain several frames, and demonstrate the attribute inconsistency issue.…”

Section: Videos On Coxmentioning

confidence: 99%

Attributes in Multiple Facial Images

Li¹,

Guo²

2018

2018 13th IEEE International Conference on Automatic Face &Amp; Gesture Recognition (FG 2018)

View full text Add to dashboard Cite

Facial attribute recognition is conventionally computed from a single image. In practice, each subject may have multiple face images. Taking the eye size as an example, it should not change, but it may have different estimation in multiple images, which would make a negative impact on face recognition. Thus, how to compute these attributes corresponding to each subject rather than each single image is a profound work. To address this question, we deploy deep training for facial attributes prediction, and we explore the inconsistency issue among the attributes computed from each single image. Then, we develop two approaches to address the inconsistency issue. Experimental results show that the proposed methods can handle facial attribute estimation on either multiple still images or video frames, and can correct the incorrectly annotated labels. The experiments are conducted on two large public databases with annotations of facial attributes.

show abstract

A Benchmark and Comparative Study of Video-Based Face Recognition on COX Face Database

Cited by 136 publications

References 42 publications

An Extended Sparse Classification Framework for Domain Adaptation in Video Surveillance

An Extended Sparse Classification Framework for Domain Adaptation in Video Surveillance

A paired sparse representation model for robust face recognition from a single sample

Attributes in Multiple Facial Images

Contact Info

Product

Resources

About