This paper presents a template-based system for speaker independent key word spotting (KWS) in continuous speech that can help in automatic analysis, indexing, search and retrieval of user generated videos by content. Extensive experiments on clean speech confirm that the proposed approach is superior to a HMM approach when applied to noisy speech with different signal-to-noise ratio (SNR) levels. Experiments conducted to detect swear words, personal names and product names within a set of online user generated video blogs shows significantly better recall and precision results compared to a traditional ASR-based approach.
NOISE ROBUST KEYWORD SPOTTING FOR USER GENERATED VIDEO BLOGS ABSTRACTThis paper presents a template-based system for speaker independent key word spotting (KWS) in continuous speech that can help in automatic analysis, indexing, search and retrieval of user generated videos by content. Extensive experiments on clean speech confirm that the proposed approach is superior to a HMM approach when applied to noisy speech with different signal-to-noise ratio (SNR) levels. Experiments conducted to detect swear words, personal names and product names within a set of online user generated video blogs shows significantly better recall and precision results compared to a traditional ASR-based approach.