Proceedings of the 3rd ACM Conference on International Conference on Multimedia Retrieval 2013
DOI: 10.1145/2461466.2461478
|View full text |Cite
|
Sign up to set email alerts
|

Fisher kernel based relevance feedback for multimodal video retrieval

Abstract: This paper proposes a novel approach to relevance feedback based on the Fisher Kernel representation in the context of multimodal video retrieval. The Fisher Kernel representation describes a set of features as the derivative with respect to the log-likelihood of the generative probability distribution that models the feature distribution. In the context of relevance feedback, instead of learning the generative probability distribution over all features of the data, we learn it only over the top retrieved resu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
7
0

Year Published

2013
2013
2020
2020

Publication Types

Select...
3
3
1

Relationship

2
5

Authors

Journals

citations
Cited by 11 publications
(7 citation statements)
references
References 23 publications
0
7
0
Order By: Relevance
“…The idea is to learn a ranking function by optimizing the pair-wise or list-wise orders between pseudo positive and negative samples. In [19], the relevance judgment over the top-ranked videos is provided by users. Then an SVM is trained using visual features represented in the Fisher vector.…”
Section: Related Workmentioning
confidence: 99%
“…The idea is to learn a ranking function by optimizing the pair-wise or list-wise orders between pseudo positive and negative samples. In [19], the relevance judgment over the top-ranked videos is provided by users. Then an SVM is trained using visual features represented in the Fisher vector.…”
Section: Related Workmentioning
confidence: 99%
“…Other works employed temporal rules with high-level concepts [14]. To the best of our knowledge the only work that used Fisher Kernel to model the temporal variation in videos is [18]. They employed a frame-based global feature descriptor for a movie-genre classification scenario.…”
Section: Related Workmentioning
confidence: 99%
“…Each of the 18 points in Fig.1(a) is the centroid of the bounding box of the corresponding body part of size L (as shown in Fig.1(b)). The accuracy of the body pose estimation is computed by comparing the positions of the groundtruth bounding box B GT i and of the estimated bounding box B E i , for each body part i = 1, ..., 18. If the overlap of B E i with B GT i is more than 80%, the body part is considered as being correctly detected.…”
Section: Groundtruth and Performance Evaluationmentioning
confidence: 99%
“…This paper extends our previous work [3,4] by including evaluation on a new video dataset, evaluating more feature extraction schemes, analyzing the influence of multiple relevance feedback iterations, and including a computational complexity analysis. To summarize, our main contributions are as follows:…”
Section: Introductionmentioning
confidence: 65%