Existing approaches to person re-identification (re-id) are dominated by supervised learning based methods, which requires a large number of manually labelled pairs of person images across every pair of camera views. This thus limits their ability to scale to large camera networks. To overcome this problem, a novel unsupervised re-id model, Generative Topic Saliency (GTS), is proposed in this paper for localised human appearance saliency selection in re-id by exploiting unsupervised generative topic modelling. It yields state-of-the-art re-id performance against existing unsupervised learning based re-id methods. For supervised methods, it also retains comparable re-id accuracy but without any need for pairwise labelled training data.We are motivated by a very intuitive principle -humans often identify people by their salient appearances and ignore the more common traits in people's appearance. Compared to the pioneering work of [2] which is also based on learning appearance saliency for re-id, our model has two advantages: (1) Interpretability -our work explicitly models human appearances and backgrounds through learning a set of latent topics corresponding to localised human appearance components and also image backgrounds, so that the background cannot be mistaken as distractions to true foreground local salient region discovery. In addition, through associating saliency with atypical human appearances, the learned saliency is also more interpretable by human sense. (2) Complexity -only a single model is needed for computing saliency for all the images in a camera view, instead of learning a different discriminative saliency model (k-NN or one-class SVM) for every patches of every image. Our model is a generalisation of the Latent Dirichlet Allocation (LDA) model [1] with an added spatial variable to make the learned topics spatially coherent. Given a dataset of M images, each image will be factorised (clustered) into a unique combination of K shared topics, with each topic generating its own proportion of words on that image. Conceptually, one topic encodes a certain distribution of visual words (patches), whose vocabulary and spatial location revealing certain patterns, in our case the visual characteristics of human appearances and backgrounds. We thus learn two types of latent topics in our model corresponding to foreground and background respectively. Since foreground appearance are in general more 'compact' than background, we choose a Gaussian distribution to encode foreground human appearance topics and a Uniform distribution to encode more spread-out background topics.A key objective of our model is to discover salient local foreground patches in a person's image that make the person stand out from other people, i.e. the model seeks not only visually distinctive but also atypical localised appearance characteristics of a person. In specific, we define a patch P A 's saliency according to three factors: The first one is how unlikely this patch will appear in a training set I R of J images at the proxi...
Current person re-identification (re-id) methods assume that (1) pre-labelled training data is available for every camera pair, (2) the gallery size is moderate in model deployment. However, both assumptions are invalid in real-world applications where camera network and gallery size increase dramatically. Under such more realistic conditions, human involvement is often inevitable to verify the results generated by an automatic computer algorithm. In this work, rather than proposing another fully-automated and yet unrealistic re-id model, we introduce a semi-automatic re-id solution. Our goal is to minimise human efforts spent in re-id deployments, while maximally drive up re-id performance. Specifically, a hybrid human-computer re-id model based on Human Verification Incremental Learning (HVIL) is formulated which does not require any pre-labelled training data, therefore scalable to new camera pairs; Moreover, this HVIL model learns cumulatively from human feedback to provide an instant improvement to re-id ranking of each probe on-the-fly, thus scalable to large gallery sizes. We further formulate a Regularised Metric Ensemble Learning (RMEL) model to combine a series of incrementally learned HVIL models into a single ensemble model to be used when human feedback becomes unavailable. We conduct extensive comparative evaluations on three benchmark datasets (CUHK03, Market-1501, and VIPeR) to demonstrate the advantages of the proposed HVIL re-id model over state-of-the-art conventional human-out-of-the-loop re-id methods and contemporary human-in-the-loop competitors. Index Terms-Person re-identification, human-in-the-loop, human-out-of-the-loop, interactive model learning, human-machine interaction, human labelling effort, human verification, hard negative mining, incremental model learning, metric ensemble.! Deployable to further population Large training dataset Exhaustively label ALL camera pairs (a) Human-Out-of-the-Loop re-id scheme Offline model training Re-Id model Deploy to the same camera pairs Annotation Stage Deployment Stage (c) HVIL: Human Verification Incremental Learning Probe population Gallery population rank/re-rank user feedback re-id models optimised in isolation Training Stage Gallery population Model-N Human-in-the-loop (b) POP: Post rank optimisation Model-1 rank/re-rank user feedback rank/re-rank user feedback Model-1 Strong model Human-in-the-loop re-id models optimised incrementally limited labour budget Not deployable to further population Model-2 Model-2
Most existing person re-identification (ReID) methods assume the availability of extensively labelled cross-view person pairs and a closed-set scenario (i.e. all the probe people exist in the gallery set). These two assumptions significantly limit their usefulness and scalability in real-world applications, particularly with large scale camera networks. To overcome the limitations, we introduce a more challenging yet realistic ReID setting termed OneShot-OpenSet-ReID, and propose a novel Regularised Kernel Subspace Learning model for ReID under this setting. Our model differs significantly from existing ReID methods due to its ability of effectively learning cross-view identity-specific information from unlabelled data alone, and its flexibility of naturally accommodating pairwise labels if available.
We propose a novel Generalized Zero-Shot learning (GZSL) method that is agnostic to both unseen images and unseen semantic vectors during training. Prior works in this context propose to map high-dimensional visual features to the semantic domain, which we believe contributes to the semantic gap. To bridge the gap, we propose a novel low-dimensional embedding of visual instances that is "visually semantic." Analogous to semantic data that quantifies the existence of an attribute in the presented instance, components of our visual embedding quantifies existence of a prototypical part-type in the presented instance. In parallel, as a thought experiment, we quantify the impact of noisy semantic data by utilizing a novel visual oracle to visually supervise a learner. These factors, namely semantic noise, visual-semantic gap and label noise lead us to propose a new graphical model for inference with pairwise interactions between label, semantic data, and inputs. We tabulate results on a number of benchmark datasets demonstrating significant improvement in accuracy over state-of-art under both semantic and visual supervision.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.