Most studies in content-based video copy detection (CBCD) concentrate on visual signatures, while only very few efforts are made to exploit audio features. The audio data, if present, is an essential source of a video; hence, the integration of visual-acoustic fingerprints significantly improves the copy detection performance. Based on this aspect, we propose a new framework, which jointly employs color-based visual features and audio fingerprints for detecting the duplicate videos. The proposed framework incorporates three stages: First, a novel visual fingerprint based on spatio-temporal dominant color features is generated; Second, mel-frequency cepstral coefficients are extracted and compactly represented as acoustic signatures; Third, the resultant multimodal signatures are jointly used for the CBCD task, by employing combination rule and weighting strategies. The results of experiments on TRECVID 2008 and 2009 datasets, demonstrate the improved efficiency of the proposed framework compared to the reference methods against a wide range of video transformations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.