2016
DOI: 10.1007/978-3-319-51811-4_21
|View full text |Cite
|
Sign up to set email alerts
|

Near-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers

Abstract: The problem of Near-Duplicate Video Retrieval (NDVR) has attracted increasing interest due to the huge growth of video content on the Web, which is characterized by high degree of near duplicity. This calls for efficient NDVR approaches. Motivated by the outstanding performance of Convolutional Neural Networks (CNNs) over a wide variety of computer vision problems, we leverage intermediate CNN features in a novel global video representation by means of a layer-based feature aggregation scheme. We perform exten… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
43
0
1

Year Published

2017
2017
2021
2021

Publication Types

Select...
5
2

Relationship

2
5

Authors

Journals

citations
Cited by 65 publications
(44 citation statements)
references
References 19 publications
0
43
0
1
Order By: Relevance
“…The proposed unsupervised NDVR approach relies on a Bag-of-Words (BoW) scheme [27]. In particular, two aggregation variations are proposed: a vector aggregation where a single codebook of visual words is used, and a layer aggregation where multiple codebooks of visual words are used.…”
Section: Bag-of-words Approachmentioning
confidence: 99%
See 1 more Smart Citation
“…The proposed unsupervised NDVR approach relies on a Bag-of-Words (BoW) scheme [27]. In particular, two aggregation variations are proposed: a vector aggregation where a single codebook of visual words is used, and a layer aggregation where multiple codebooks of visual words are used.…”
Section: Bag-of-words Approachmentioning
confidence: 99%
“…In Section 4.2, we review the related literature in the field of NDVR by providing an outline of the major trends in the field. In Section 4.3, we present the two aforementioned NDVR approaches that have been developed within the InVID project [27,28]. In Section 4.4, we report on the results of a comprehensive experimental study, including a comparison with five state-of-the-art methods.…”
Section: Introductionmentioning
confidence: 99%
“…However, the results show that the best performance is achieved when combining the deep feature descriptor with a global descriptor using Scalable Compressed Fisher Vectors (SCFV) [20]. Recently, an approach for using features from intermediate CNN layers for near-duplicate video retrieval has been proposed [21], showing that the additionally preserved structural information improves matching performance.…”
Section: Related Workmentioning
confidence: 99%
“…This leads to specialized solutions that typically exhibit poor performance when used (without tuning) on different video corpora. For instance, some methods learn codebooks [24,1,4,14] or hashing functions [25,26,7] based on sample frames from the evaluation dataset, and as a result their reported retrieval performance is often exaggerated.…”
Section: Introductionmentioning
confidence: 99%
“…Motivated by the excellent performance of deep learning in a wide variety of multimedia problems, we are proposing a video-level NDVR approach that incorporates deep learning in two steps. First, we use CNN features from intermediate convolution layers based on a well-known scheme called Maximum Activation of Convolutions [22,34,21], which was recently used for NDVR and led to improved results [14]. Second, we leverage a Deep Metric Learning (DML) framework based on a triplet-wise scheme, which has been shown to be effective in a variety of cases [2,30,29].…”
Section: Introductionmentioning
confidence: 99%