This paper presents a novel way to perform multi-modal face recognition. We use Partial Least Squares (PLS) to linearly map images in different modalities to a common IntroductionIn face recognition, one often seeks to compare gallery images taken under one set of conditions, to a probe image acquired differently.For example, in criminal investigations, we might need to compare mugshots to a sketch drawn by a sketch artist based on the verbal description of the suspect. Similarly, mug-shots or passport photos might be compared to surveillance images taken from a different viewpoint. The probe image might also be of lower resolution (LR) compared to a gallery of high resolution (HR) images.We propose a general framework that uses Partial Least Squares (PLS) [16] to perform recognition in a wide range of multi-modal scenarios. PLS has been used very effectively for face recognition, but in a different manner, with different motivation [17,19,20,21,22]; our contribution is to show how and why PLS can be used for cross-modal recognition. More generally, we argue for the applicability of linear projection to an intermediate subspace for multi-modal recognition, also pointing out the value of the Bilinear Model (BLM) [14] for face recognition, which also achieves state-of-the art results on some problems. Experimental evaluation of our framework using PLS with pose variation has shown significant improvements in terms of accuracy and run-time over the state-of-art on the CMU PIE face data set [26]. For sketchphoto recognition, our method is comparable to the state of-art. We also illustrate the potential of our method to handle variation in resolution with a simple, synthetic example. In all three domains we apply exactly the same algorithm, and use the same, simple representation of images. Our generic approach performs either near or better than state-of-the-art approaches that have been designed for specific cross-modal conditions.Our approach matches probe and gallery images by linearly projecting them into an intermediate space where images with the same identity are highly correlated ( Figure 1). We argue that for a variety of cross-modality recognition problems, such projections will exist and can be found using PLS and BLM. One consequence of our approach is that we do not need to synthesize an artificial gallery image from the probe image. Related WorkThere has been a huge amount of prior work on comparing images taken in different modalities, which we
With the advent of affordable depth sensors, 3D capture becomes more and more ubiquitous and already has made its way into commercial products. Yet, capturing the geometry or complete shapes of everyday objects using scanning devices (e.g. Kinect) still comes with several challenges that result in noise or even incomplete shapes. Recent success in deep learning has shown how to learn complex shape distributions in a data-driven way from large scale 3D CAD Model collections and to utilize them for 3D processing on volumetric representations and thereby circumventing problems of topology and tessellation. Prior work has shown encouraging results on problems ranging from shape completion to recognition. We provide an analysis of such approaches and discover that training as well as the resulting representation are strongly and unnecessarily tied to the notion of object labels. Thus, we propose a full convolutional volumetric auto encoder that learns volumetric representation from noisy data by estimating the voxel occupancy grids. The proposed method outperforms prior work on challenging tasks like denoising and shape completion. We also show that the obtained deep embedding gives competitive performance when used for classification and promising results for shape interpolation.
3D human pose estimation from a single image is a challenging problem, especially for in-the-wild settings due to the lack of 3D annotated data. We propose two anatomically inspired loss functions and use them with the weaklysupervised learning framework of [41] to jointly learn from large-scale in-thewild 2D and indoor/synthetic 3D data. We also present a simple temporal network that exploits temporal and structural cues present in predicted pose sequences to temporally harmonize the pose estimations. We carefully analyze the proposed contributions through loss surface visualizations and sensitivity analysis to facilitate deeper understanding of their working mechanism. Our complete pipeline improves the state-of-the-art by 11.8% and 12% on Human3.6M and MPI-INF-3DHP, respectively, and runs at 30 FPS on a commodity graphics card.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.