Abstract-Re-identification is generally carried out by encoding the appearance of a subject in terms of outfit, suggesting scenarios where people do not change their attire. In this paper we overcome this restriction, by proposing a framework based on a deep convolutional neural network, SOMAnet, that additionally models other discriminative aspects, namely, structural attributes of the human figure (e.g. height, obesity, gender). Our method is unique in many respects. First, SOMAnet is based on the Inception architecture, departing from the usual siamese framework. This spares expensive data preparation (pairing images across cameras) and allows the understanding of what the network learned. Second, and most notably, the training data consists of a synthetic 100K instance dataset, SOMAset, created by photorealistic human body generation software. Synthetic data represents a good compromise between realistic imagery, usually not required in re-identification since surveillance cameras capture low-resolution silhouettes, and complete control of the samples, which is useful in order to customize the data w.r.t. the surveillance scenario at-hand, e.g. ethnicity. SOMAnet, trained on SOMAset and fine-tuned on recent re-identification benchmarks, outperforms all competitors, matching subjects even with different apparel. The combination of synthetic data with Inception architectures opens up new research avenues in re-identification.
We present a 3D shape retrieval methodology based on the theory of spherical harmonics. Using properties of spherical harmonics, scaling and axial flipping invariance is achieved. Rotation normalization is performed by employing the continuous principal component analysis along with a novel approach which applies PCA on the face normals of the model. The 3D model is decomposed into a set of spherical functions which represents not only the intersections of the corresponding surface with rays emanating from the origin but also points in the direction of each ray which are closer to the origin than the furthest intersection point. The superior performance of the proposed methodology is demonstrated through a comparison against state-of-the-art approaches on standard databases. ᭧
A novel method for the classification and retrieval of 3D models is proposed; it exploits the 2D panoramic view representation of 3D models as input to an ensemble of Convolutional Neural Networks which automatically compute the features. The first step of the proposed pipeline, pose normalization is performed using the SYMPAN method, which is also computed on the panoramic view representation. In the training phase, three panoramic views corresponding to the major axes, are used for the training of an ensemble of Convolutional Neural Networks. The panoramic views consist of 3-channel images, containing the Spatial Distribution Map, the Normals' Deviation Map and the magnitude of the Normals' Devation Map Gradient Image. The proposed method aims at capturing feature continuity of 3D models, while simultaneously minimizing data preprocessing via the construction of an augmented image representation. It is extensively tested in terms of classification and retrieval accuracy on two standard large scale datasets: ModelNet and ShapeNet. 1. Introduction 1 In the recent past, convolutional neural networks (CNN) have 2 shown their superiority against humans in computing features, 3 while they are very sensitive to the input representation. In this 4 work an extension of the PANORAMA 3D shape representa-5 tion, previously proposed by our team (Papadakis et al., 2010), 6 is exploited as the input representation to a CNN for computing 7 descriptor features for 3D object classification and retrieval. 8 The 3D models are initially pose normalized using the SYM-9 PAN pose normalization algorithm, (Sfikas et al., 2014) which 10 is based on the use of reflective symmetry on their panoramic 11 view images. Next, an augmented panoramic view is created 12 and used to train the convolutional neural network. This aug-13 mented panoramic view consists of the spatial and orientation 14 components of PANORAMA, (see 3.1.1), along with the mag-15 nitude of the gradient image which is extracted from the ori-16 entation component. A reduction in the size of the augmented 17 panoramic view representation is shown to benefit the training 18 procedure.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.