Recent advancement of research in biometrics, computer vision,
and natural language processing has discovered opportunities for person retrieval from surveillance videos using textual query. The prime objective of
a surveillance system is to locate a person using a description, e.g., a short
woman with a pink t-shirt and white skirt carrying a black purse. She has
brown hair. Such a description contains attributes like gender, height, type of
clothing, colour of clothing, hair colour, and accessories. Such attributes are
formally known as soft biometrics. They help bridge the semantic gap between
a human description and a machine as a textual query contains the person’s
soft biometric attributes. It is also not feasible to manually search through
huge volumes of surveillance footage to retrieve a specific person. Hence, automatic person retrieval using vision and language-based algorithms is becoming
popular. In comparison to other state-of-the-art reviews, the contribution of
the paper is as follows: 1. Recommends most discriminative soft biometrics for
specific challenging conditions. 2. Integrates benchmark datasets and retrieval
methods for objective performance evaluation. 3. A complete snapshot of techniques based on features, classifiers, number of soft biometric attributes, type
of the deep neural networks, and performance measures. 4. The comprehensive coverage of person retrieval from handcrafted features based methods to
end-to-end approaches based on natural language description.