Automatic semantic video annotation in wide domain videos based on similarity and commonsense knowledgebases

Altadmri, Amjad; Ahmed, Amr

doi:10.1109/icsipa.2009.5478723

Cited by 9 publications

(9 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…There are some pioneering works in [1] and [2] concentrating on generating sentences for videos. [1] introduces a novel twostep framework for textually annotating unconstrained videos: visual similarity video matching at first and then an annotation analysis that employs commonsense knowledge bases.…”

Section: Introductionmentioning

confidence: 99%

What Is Happening in the Video? —Annotate Video by Sentence

Qian¹,

Liu²,

et al. 2016

IEEE Trans. Circuits Syst. Video Technol.

View full text Add to dashboard Cite

Due to the popularity of online video sharing websites such as YouTube, millions of users have treated online video as a source of information and entertainment. So Video annotation has evoked great interest in the past few years. In this paper, we propose a four-step approach to automatically annotate video shots with sentences. The first step is video preprocessing, converting video shot into a sequence of frame images. The second step is to find related candidate elements of the sentence about the video contents. The main elements in the sentence are objects, events, scenes, and modifiers. These candidate elements are gained by searching for similar images with the video frames in our collected image datasets instead of video datasets. The third step is to select the best elements among these candidate ones by a weighted scoring algorithm. The final step is to construct a sentence with the help of a correlation graph algorithm to analyze the relationships among the best elements. The experimental results indicate that our method is effective to annotate videos with sentences. What is more, the weighted scoring algorithm and the correlation graph algorithm that we propose are efficient in developing the experimental performance.

show abstract

Section: Introductionmentioning

confidence: 99%

What Is Happening in the Video? —Annotate Video by Sentence

Qian¹,

Liu²,

et al. 2016

IEEE Trans. Circuits Syst. Video Technol.

View full text Add to dashboard Cite

show abstract

“…A preliminary version of this work was published in conference-form in [2]. This consolidated version is extended by enhancing the first layer, introducing more technical details, and by performing more experiments on common public databases, with deeper analysis and evaluation, using standard TRECVID measures.…”

Section: Related Workmentioning

confidence: 99%

“…In the comparison phase, i.e. the distance measured between the query and each dataset's files, the video signatures are compared, and 2 An example for the framework: an airplane is taking off. The first similar video retrieved is a false positive as it is a car.…”

Section: Layer 1: Visual Similaritymentioning

confidence: 99%

A framework for automatic semantic video annotation

Altadmri

Ahmed

2013

Multimed Tools Appl

Self Cite

View full text Add to dashboard Cite

The rapidly increasing quantity of publicly available videos has driven research into developing automatic tools for indexing, rating, searching and retrieval. Textual semantic representations, such as tagging, labelling and annotation, are often important factors in the process of indexing any video, because of their user-friendly way of representing the semantics appropriate for search and retrieval. Ideally, this annotation should be inspired by the human cognitive way of perceiving and of describing videos. The difference between the low-level visual contents and the corresponding human perception is referred to as the 'semantic gap'. Tackling this gap is even harder in the case of unconstrained videos, mainly due to the lack of any previous information about the analyzed video on the one hand, and the huge amount of generic knowledge required on the other. This paper introduces a framework for the Automatic Semantic Annotation of unconstrained videos. The proposed framework utilizes two non-domain-specific layers: low-level visual similarity matching, and an annotation analysis that employs commonsense knowledgebases. Commonsense ontology is created by incorporating multiple-structured semantic relationships. Experiments and black-box tests are carried out on standard video databases for action recognition and video information retrieval. White-box tests examine the performance of the individual intermediate layers of the framework, and the evaluation of the results and the statistical analysis show that integrating visual similarity matching with commonsense semantic relationships provides an effective approach to automated video annotation.

show abstract

“…Then ConceptNet is used to calculate the distance between the concepts. In addition to that, in our previous work [9], a full automated framework for semantic video annotation in wide-domain has been presented based on using WordNet and ConceptNet separately.…”

Section: Previous Workmentioning

confidence: 99%

VisualNet: Commonsense knowledgebase for video and image indexing and retrieval application

Altadmri

Ahmed

2009

2009 IEEE International Conference on Intelligent Computing and Intelligent Systems

Self Cite

View full text Add to dashboard Cite

Abstract-The rapidly increasing amount of video collections, available on the web or via broadcasting, motivated research towards building intelligent tools for searching, rating, indexing and retrieval purposes. Establishing a semantic representation of visual data, mainly in textual form, is one of the important tasks. The time needed for building and maintaining Ontologies and knowledge, especially for wide domain, and the efforts for integrating several approaches emphasize the need for unified generic commonsense knowledgebase for visual applications.In this paper, we propose a novel commonsense knowledgebase that forms the link between the visual world and its semantic textual representation. We refer to it as "VisualNet". VisualNet is obtained by our fully automated engine that constructs a new unified structure concluding the knowledge from two commonsense knowledgebases, namely WordNet and ConceptNet. This knowledge is extracted by performing analysis operations on WordNet and ConceptNet contents, and then only useful knowledge in visual domain applications is considered. Moreover, this automatic engine enables this knowledgebase to be developed, updated and maintained automatically, synchronized with any future enhancement on WordNet or ConceptNet.Statistical properties of the proposed knowledgebase, in addition to an evaluation of a sample application results, show coherency and effectiveness of the proposed knowledgebase and its automatic engine.

show abstract

Automatic semantic video annotation in wide domain videos based on similarity and commonsense knowledgebases

Cited by 9 publications

References 25 publications

What Is Happening in the Video? —Annotate Video by Sentence

What Is Happening in the Video? —Annotate Video by Sentence

A framework for automatic semantic video annotation

VisualNet: Commonsense knowledgebase for video and image indexing and retrieval application

Contact Info

Product

Resources

About