Soft biometrics are human describable, distinguishing human characteristics. We present a baseline solution to the problem of identifying individuals solely from human descriptions, by automatically retrieving soft biometric labels from images. Probe images are then identified from a gallery of known soft biometric signatures, using their predicted labels. We investigate four labelling techniques and a number of challenging re-identification scenarios with this method. We also present a novel dataset, SoBiR, consisting of 8 camera viewpoints, 100 subjects and 4 forms of comprehensive human annotation to facilitate soft biometric retrieval. We report the increased retrieval accuracy of binary labels, the generalising capability of continuous measurements and the overall performance improvement of comparative annotations over categorical annotations.
Soft biometrics provide cues that enable human identification from low quality video surveillance footage. This paper discusses a new crowdsourced dataset, collecting comparative soft biometric annotations from a rich set of human annotators. We now include gender as a comparative trait, and find comparative labels are more objective and obtain more accurate measurements than previous categorical labels. Using our pragmatic dataset, we perform semantic recognition by inferring relative biometric signatures. This demonstrates a practical scenario, reproducing responses from a video surveillance operator searching for an individual. The experiment is guaranteed to return the correct match in the the top 7% of results with 10 comparisons, or top 13% of results using just 5 sets of subject comparisons.
Recognising human attributes from surveillance footage is widely studied for attribute-based re-identification. However, most works assume coarse, expertly-defined categories, ineffective in describing challenging images. Such brittle representations are limited in descriminitive power and hamper the efficacy of learnt estimators. We aim to discover more relevant and precise subject descriptions, improving image retrieval and closing the semantic gap. Inspired by fine-grained and relative attributes, we introduce super-fine attributes, which now describe multiple, integral concepts of a single trait as multi-dimensional perceptual coordinates. Crowd prototyping facilitates efficient crowdsourcing of super-fine labels by pre-discovering salient perceptual concepts for prototype matching. We re-annotate gender, age and ethnicity traits from PETA, a highly diverse (19K instances, 8.7K identities) amalgamation of 10 re-id datasets including VIPER, CUHK and TownCentre. Employing joint attribute regression with the ResNet-152 CNN, we demonstrate substantially improved ranked retrieval performance with super-fine attributes in direct comparison to conventional binary labels, reporting up to a 11.2% and 14.8% mAP improvement for gender and age, further surpassed by ethnicity. We also find our 3 super-fine traits to outperform 35 binary attributes by 6.5% mAP for subject retrieval in a challenging zero-shot identification scenario.
Abstract-Automatically describing pedestrians in surveillance footage is crucial to facilitate human accessible solutions for suspect identification. We aim to identify pedestrians based solely on human description, by automatically retrieving semantic attributes from surveillance images, alleviating exhaustive label annotation. This work unites a deep learning solution with relative soft biometric labels, to accurately retrieve more discriminative image attributes. We propose a Semantic Retrieval Convolutional Neural Network to investigate automatic retrieval of three soft biometric modalities, across a number of 'closed-world' and 'open-world' re-identification scenarios. Findings suggest that relative-continuous labels are more accurately predicted than absolute-binary and relative-binary labels, improving semantic identification in every scenario. Furthermore, we demonstrate a top rank-1 improvement of 23.2% and 26.3% over a traditional, baseline retrieval approach, in one-shot and multi-shot reidentification scenarios respectively.
A fusion approach to person recognition is presented here outlining the automated recognition of targets from human descriptions of face, body and clothing. Three novel results are highlighted. First, the present work stresses the value of comparative descriptions (he is taller than…) over categorical descriptions (he is tall). Second, it stresses the primacy of the face over body and clothing cues for recognition. Third, the present work unequivocally demonstrates the benefit gained through the combination of cues: recognition from face, body and clothing taken together far outstrips recognition from any of the cues in isolation. Moreover, recognition from body and clothing taken together nearly equals the recognition possible from the face alone. These results are discussed with reference to the intelligent fusion of information within police investigations. However, they also signal a potential new era in which automated descriptions could be provided without the need for human witnesses at all.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.