AcknowledgmentsThe completion of my PhD research programme has been an enjoyable and at times difficult and complex journey that has involved a number of people that have assisted me in so many ways throughout the process. The people that I thank here is by no means a comprehensive list as there isn't enough paper for the names I have to mention.I would firstly like to thank my supervisory team, Clinton Fookes, Sridha Sridharan, and Simon Denman for providing valuable guidance throughout the course of my study. I would like to thank Clinton and Sridha for providing me with the opportunity to study my PhD, and also for providing the Vision and Signal Processing (VSP) laboratory environment that allowed me to research at a high level, and interact with other PhD students to assist with my PhD. I would like to make special mention of Simon for his day to day support and being consistently available to help me with my endless questions, and without his help, guidance, and encouragement I would not have made it to the end of my PhD. Another member of the of the VSP discipline that I would like to thank is David Dean, as a member of VSP he assisted me multiple times throughout the course of my PhD.I would also like to thank the other students and past and present staff within the VSP laboratory, thank you for the sharing of knowledge but mostly thank you all for the entertainment, and enjoyable moments that we shared throughout this time.
AbstractIn surveillance and security situations it is often necessary to search and locate a subject of interest from a verbally or textually supplied subject description. At present, searches using these forms of queries are still predominantly performed manually and often ineffectively, via physical searching of a premises or personally watching countless hours of video footage, looking for the specific subject of interest. Surveillance networks are typically monitored by few people viewing several monitors displaying single and multiple camera feeds. In these situations, the ability to translate an externally supplied description into one of computer vision relevance would be of great assistance.To date, state-of-the-art soft biometric based search techniques primarily include some form of pre-enrollment component, where the target subject is enrolled from one camera and searched in other cameras for a match, or alternatively enrolled in an earlier instance of the same camera, and searched for in later footage. These re-identification techniques exclude however, instances where the subject has not previously been located and enrolled from the footage.Soft biometric based semantic search is one technique that provides a solution to this problem. The target query no longer requires scene interrogation and instead can be generated from a textually supplied query. While soft biometrics have obvious limitations, non-permanence and not individually identifiable, they are generally simplistic for a human operator to visually extract and relay to another human operator. They also have th...