In this paper, we propose a new deep framework which predicts facial attributes and leverage it as a soft modality to improve face identification performance. Our model is an end to end framework which consists of a convolutional neural network (CNN) whose output is fanned out into two separate branches; the first branch predicts facial attributes while the second branch identifies face images. Contrary to the existing multi-task methods which only use a shared CNN feature space to train these two tasks jointly, we fuse the predicted attributes with the features from the face modality in order to improve the face identification performance. Experimental results show that our model brings benefits to both face identification as well as facial attribute prediction performance, especially in the case of identity facial attributes such as gender prediction. We tested our model on two standard datasets annotated by identities and face attributes. Experimental results indicate that the proposed model outperforms most of the current existing face identification and attribute prediction methods.
Face sketch-photo synthesis is a critical application in law enforcement and digital entertainment industry where the goal is to learn the mapping between a face sketch image and its corresponding photo-realistic image. However, the limited number of paired sketch-photo training data usually prevents the current frameworks to learn a robust mapping between the geometry of sketches and their matching photo-realistic images. Consequently, in this work, we present an approach for learning to synthesize a photo-realistic image from a face sketch in an unsupervised fashion. In contrast to current unsupervised image-to-image translation techniques, our framework leverages a novel perceptual discriminator to learn the geometry of human face. Learning facial prior information empowers the network to remove the geometrical artifacts in the face sketch. We demonstrate that a simultaneous optimization of the face photo generator network, employing the proposed perceptual discriminator in combination with a texture-wise discriminator, results in a significant improvement in quality and recognition rate of the synthesized photos. We evaluate the proposed network by conducting extensive experiments on multiple baseline sketch-photo datasets.
Cross-modal hashing facilitates mapping of heterogeneous multimedia data into a common Hamming space, which can be utilized for fast and flexible retrieval across different modalities. In this paper, we propose a novel cross-modal hashing architecture-deep neural decoder cross-modal hashing (DNDCMH), which uses a binary vector specifying the presence of certain facial attributes as an input query to retrieve relevant face images from a database. The DNDCMH network consists of two separate components: an attribute-based deep cross-modal hashing (ADCMH) module, which uses a margin (m)-based loss function to efficiently learn compact binary codes to preserve similarity between modalities in the Hamming space, and a neural error correcting decoder (NECD), which is an error correcting decoder implemented with a neural network. The goal of NECD network in DNDCMH is to error correct the hash codes generated by ADCMH to improve the retrieval efficiency. The NECD network is trained such that it has an error correcting capability greater than or equal to the margin (m) of the margin-based loss function. This results in NECD can correct the corrupted hash codes generated by ADCMH up to the Hamming distance of m. We have evaluated and compared DNDCMH with state-of-the-art cross-modal hashing methods on standard datasets to demonstrate the superiority of our method.! DCMH [22], the inter-modal triplet embedding loss encourages the heterogeneous correlation across different modalities, and the intra-modal triplet loss encodes the discriminative power of the hash codes. Moreover, a regularization loss is used to apply adjacency consistency to ensure that the hash codes can keep the original similarities in Hamming space. However, in margin-based loss functions, some of the instances of different modalities of the same subject may not be close enough in Hamming space to guarantee all the correct retrievals. Therefore, it is important to bring the different modalities of the same subject closer to each other in Hamming space to improve the retrieval efficiency.In this work, we observe that in addition to the regular DCMH techniques [13], [24], [25], which exploit entropy maximization and quantization losses in the objective function of the DCMH, an error-correcting code (ECC) decoder can be used as an additional component to compensate for the heterogeneity gap and reduce the Hamming distance between the different modalities of the same subject in order to improve the cross-modal retrieval efficiency. We presume that the hash code generated by DCMH is a binary vector that is within a certain distance from a codeword of an ECC. When the hash code generated by DCMH is passed through an ECC decoder, the closest codeword to this hash code is found, which can be used as a final hash code for the retrieval process. In this process, the attribute hash code and image hash code of the same subject are forced to map to the same codeword, thereby reducing the distance of the corresponding hash codes. This brings more relevant facial images ...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.