Spoken Term Detection (STD) can be considered as a sub-part of the automatic speech recognition which aims to extract the partial information from speech signals in the form of query utterances. A variety of STD techniques available in the literature employ a single source of evidence for the query utterance match/mismatch determination. In this manuscript, we develop an acoustic signal processing based approach for STD that incorporates a number of techniques for silence removal, dynamic noise filtration, and evidence combination using Dempster-Shafer Theory (DST). A "spectral-temporal features based voiced segment detection" and "energy and zero cross rate based unvoiced segment detection" are built to remove the silence segments in the speech signal. Comprehensive experiments have been performed on large speech datasets and consequently satisfactory results have been achieved with the proposed approach. Our approach improves the existing speaker dependent STD approaches, specifically the reliability of query utterance spotting by combining the evidences from multiple belief sources.Keywords: Spoken term detection, Acoustic keyword spotting, Query-by-example, Dempster-Shafer"s theory, Speech recognition, Speech processing.
Acknowledgement:A special gratitude we give to Prof. Daniel Neagu, University of Bradford, whose contribution in stimulating suggestions helped us to coordinate this research in terms of statistical analysis, and performance evaluation methods.