Benchmarking Still-to-Video Face Recognition via Partial and Local Linear Discriminant Analysis on COX-S2V Dataset

Huang, Zhiwu; Shan, Shiguang; Zhang, Haihong; Lao, Shihong; Kuerban, Alifu; Chen, Xilin

doi:10.1007/978-3-642-37444-9_46

Cited by 24 publications

(30 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Recently, in [22] partial and local linear discriminant analysis has been proposed using samples containing a high-quality still and a set of low resolution video sequences of each individual for still-to-video FR as a baseline on the COX-S2V dataset. Similarly, coupling quality and geometric alignment with recognition [23] has been proposed, where the best qualified frames from video are selected to match against well-aligned high-quality face stills with the most similar quality.…”

Section: State-of-the-art Techniquesmentioning

confidence: 99%

“…However, watch-list screening is a challenging problem, and performance of the state-of-the-art still-to-video FR systems decline due to semi-or uncontrolled conditions and camera inter-operability [10], [22]. Nuisance factors that cause changes in facial appearance are mostly variations in illumination, pose, scale, resolution, expression, motion blur, and occlusion [5].…”

Section: State-of-the-art Techniquesmentioning

confidence: 99%

“…4 Experimental Methodology Different aspects of the proposed framework are evaluated experimentally using Chokepoint [50] and COX-S2V [22] still-to-video datasets. First, experiments assess the performance of classifiers trained on ROI patterns extracted using different feature extraction techniques.…”

Section: Spatio-temporal Fusionmentioning

confidence: 99%

“…The performance of the still-to-video FR systems designed according to the proposed framework are compared to reference systems [6], [41], [53] using videos from the publicly-available Chokepoint [50] and COX-S2V [22] datasets. Accuracy and efficiency are measured at the transaction-level (matching of input probe ROI against reference ROI) and at the trajectory-level (the entire FR system over multiple frames).…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Robust watch-list screening using dynamic ensembles of SVMs based on multiple face representations

Bashbaghi

Granger

Sabourin

et al. 2017

Machine Vision and Applications

View full text Add to dashboard Cite

Abstract. Still-to-video face recognition (FR) is an important function in video surveillance, where faces captured over a network of video cameras are matched against reference stills of target individuals. Screening faces against a watch-list is a challenging video surveillance application because the appearance of faces vary due to changing capture conditions and operational domains. The facial models used for matching may not be representative of faces captured with video cameras because they are typically designed a priori with only one reference still. In this paper, a multi-classifier framework is proposed for robust still-to-video FR based on multiple and diverse face representations of a single reference face still. During enrollment of a target individual, the single reference face still is modeled using an ensemble of SVM classifiers based on different patches and face descriptors. Multiple feature extraction techniques are applied to patches isolated in the reference still to generate a diverse SVM pool that provides robustness to common nuisance factors (e.g., variations in illumination and pose). The estimation of discriminant feature subsets, classifier parameters, decision thresholds, and ensemble fusion functions is achieved using the high-quality reference still and a large number of faces captured in lower quality video of non-target individuals in the scene. During operations, the most competent subset of SVMs are dynamically selected according to capture conditions. Finally, a head-face tracker gradually regroups faces captured from different people appearing in a scene, while each individual-specific ensemble performs face matching. The accumulation of matching scores per face track leads to a robust spatio-temporal FR when accumulated ensemble scores surpass a detection threshold. Experimental results obtained with the Chokepoint and COX-S2V datasets show a significant improvement in performance w.r.t. reference systems, especially when individual-specific ensembles (1) are designed using exemplar-SVMs rather than one-class SVMs, and (2) exploit score-level fusion of local SVMs (trained using features extracted from each patch), rather than using either decision-level or feature-level fusion with a global SVM (trained by concatenating features extracted from patches).

show abstract

Section: State-of-the-art Techniquesmentioning

confidence: 99%

Section: State-of-the-art Techniquesmentioning

confidence: 99%

Section: Spatio-temporal Fusionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Robust watch-list screening using dynamic ensembles of SVMs based on multiple face representations

Bashbaghi

Granger

Sabourin

et al. 2017

Machine Vision and Applications

View full text Add to dashboard Cite

show abstract

“…A natural way to deal with this problem is to learn a common mapping space for the polymorphous samples. Typically, Huang et al [7] proposed an improved LDA [8] to learn projections by using partial weighting to emphasize cross-scenario images in the discriminant analysis. One shortcoming of this method is its reliance on a single mapping to build a common dis- criminant space for samples in different scenarios.…”

Section: Introductionmentioning

confidence: 99%

Point-Manifold Discriminant Analysis for Still-to-Video Face Recognition

Xue

Wang

Xiao

et al. 2014

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

SUMMARYIn Still-to-Video (S2V) face recognition, only a few high resolution images are registered for each subject, while the probe is video clips of complex variations. As faces present distinct characteristics under different scenarios, recognition in the original space is obviously inefficient. Thus, in this paper, we propose a novel discriminant analysis method to learn separate mappings for different scenario patterns (still, video), and further pursue a common discriminant space based on these mappings. Concretely, by modeling each video as a manifold and each image as point data, we form the scenario-oriented mapping learning as a Point-Manifold Discriminant Analysis (PMDA) framework. The learning objective is formulated by incorporating the intra-class compactness and inter-class separability for good discrimination. Experiments on the COX-S2V dataset demonstrate the effectiveness of the proposed method.

show abstract