Extended conceptual feedback for semantic multimedia indexing

Hamadi, Abdelkader; Mulhem, Philippe; Quénot, Georges

doi:10.1007/s11042-014-1937-y

Cited by 5 publications

(10 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Finally, two other improvement techniques were tried: conceptual re-scoring and use of an uploader model. Conceptual re-scoring is different from conceptual feedback, it is similar to temporal re-scoring but it exploits the semantic similarity between concepts instead of the temporal closeness between video shots [14]. It did not prove useful probably because, even if based on a different method, it captures the same type of information as the conceptual feedback done previously.…”

Section: Fusion and Other Improvement Methodsmentioning

confidence: 99%

“…• Concepts features corresponding to the conceptual feedback approach [14] applied two times. These are were originally designed for being used with engineered descriptors but they acn also include other semantic descriptors; here they have been computed including the Xerox semantic descriptors.…”

Section: Semantic Featuresmentioning

confidence: 99%

“…It is also greater for semantic features than for quasi-semantic ones (corresponding to internal layers). It also appears smaller for conceptual feedback features but this is due to the fact that the feedback features were computed from scores that were already re-scored as it works better like this [14]. Figure 4 shows the performance gain brought by the successive fusion of descriptors of increasing performance.…”

Section: Temporal Re-scoring With Semantic Featuresmentioning

confidence: 99%

“…In all these works and many other similar ones, the semantic features are learned on completely different collections and generally for concepts or categories different from those searched for on the target collection. Hamadi et al [14] used the approach using the same collection and the same concepts both for the semantic feature training and for their use in a further classification step. In this variant, called "conceptual feedback", a given target concept is learned both from the "low-level" features and from the detection scores of the other target concepts also learned from the same low-level features (the training of the semantic features has to be done by crossvalidation within the training set so that it can be used for the second training step both on the training and test sets).…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Learned features versus engineered features for semantic video indexing

Budnik

Gomez

Safadi

et al. 2015

2015 13th International Workshop on Content-Based Multimedia Indexing (CBMI)

View full text Add to dashboard Cite

In this paper, we compare "traditional" engineered (hand-crafted) features (or descriptors) and learned features for content-based semantic indexing of video documents. Learned (or semantic) features are obtained by training classifiers for other target concepts on other data. These classifiers are then applied to the current collection. The vector of classification scores is the new feature used for training a classifier for the current target concepts on the current collection. If the classifiers used on the other collection are of the Deep Convolutional Neural Network (DCNN) type, it is possible to use as a new feature not only the score values provided by the last layer but also the intermediate values corresponding to the output of all the hidden layers. We made an extensive comparison of the performance of such features with traditional engineered ones as well as with combinations of them. The comparison was made in the context of the TRECVid semantic indexing task. Our results confirm those obtained for still images: features learned from other training data generally outperform engineered features for concept recognition. Additionally, we found that directly training SVM classifiers using these features does significantly better than partially retraining the DCNN for adapting it to the new data. We also found that, even though the learned features performed better that the engineered ones, the fusion of both of them perform significantly better, indicating that engineered features are still useful, at least in this case.

show abstract

Section: Fusion and Other Improvement Methodsmentioning

confidence: 99%

Section: Semantic Featuresmentioning

confidence: 99%

Section: Temporal Re-scoring With Semantic Featuresmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Learned features versus engineered features for semantic video indexing

Budnik

Gomez

Safadi

et al. 2015

2015 13th International Workshop on Content-Based Multimedia Indexing (CBMI)

View full text Add to dashboard Cite

show abstract

“…The operation continues until no shot with a significant correlation is found. In [14], the authors exploit the idea presented in [8], [9] and propose another approach that consists of generating a descriptor by performing an early fusion of high-level descriptors of shots belonging to a temporal window centered on the current shot. They achieved very interesting results and enhanced a good baseline system.…”

Section: Related Workmentioning

confidence: 99%

Temporal re-scoring vs. temporal descriptors for semantic indexing of videos

Hamadi

Mulhem

Quénot

2015

2015 13th International Workshop on Content-Based Multimedia Indexing (CBMI)

Self Cite

View full text Add to dashboard Cite

International audienceThe automated indexing of image and video is a difficult problem because of the "distance" between the arrays of numbers encoding these documents and the concepts (e.g. people, places, events or objects) with which we wish to annotate them. Methods exist for this but their results are far from satisfactory in terms of generality and accuracy. Existing methods typically use a single set of such examples and consider it as uniform. This is not optimal because the same concept may appear in various contexts and its appearance may be very different depending upon these contexts. The context has been widely used in the state of the art to treat various problems. However, the temporal context seems to be the most crucial and the most effective for the case of videos. In this paper, we present a comparative study between two methods exploiting the temporal context for semantic video indexing. The proposed approaches use temporal information that is derived from two different sources: low-level content and semantic information. Our experiments on TRECVID'12 collection showed interesting results that confirm the usefulness of the temporal context and demonstrate which of the two approaches is more effective

show abstract