The large amount of available multimedia information (e.g. videos, audio, images) requires efficient and effective annotation and retrieval methods. As videos start playing a more important role in the frame of multimedia, we want to make these available for content-based retrieval.The ImageMiner-System, which was developed at the University of Bremen in the AT group, is designed for content-based retrieval of single images by a new combination of techniques and methods from computer vision and artificial intelligence.In our approach to make videos available for retrieval in a large database of videos and images there are two necessary steps: First, the detection and extraction of shots from a video, which is done by a histogram based method and second, the construction of the separate frames in a shot to one still single image. This is performed by a mosaicing-technique. The resulting mosaiced image gives a one image visualization of the shot and can be analyzed by the the ImageMiner-System. ImageMiner has been tested on several domains, (e.g. landscape images, technical drawings), which cover a wide range of applications.
In this paper videos are analyzed to get a content-based decription of the video. The structure of a given video is useful to index long videos efficiently and automatically. A comparison between shots gives an overview about cut frequency, cut pattern, and scene bounds.After a shot detection the shots are grouped into clusters based on their visual similarity. A time-constraint clustering procedure is used to compare only those shots that are positioned inside a time range. Shots from different areas of the video (e.g., begin/end) are not compared. With this cluster information that contains a list about shots and their clusters it is possible to calculate scene bounds. A labeling of all clusters gives a declaration about the cut pattern. It is easy now to distinguish a dialogue from an action scene. The final content analysis is done by the ImageMiner* system. The ImageMiner system developed at the University of Bremen of the Image Processing Department of the Center for Computing Technology realizes content-based image retrieval for still images through a novel combination of methods and techniques of computer vision and artifical intelligence.The ImageMiner system consists of three analysis modules for computer vision, namely for color, texture, and contour analysis. Additionally exists a module for object recognition. The output of the object recognition module can be indexed by a text retrieval system. Thus, concepts like forestscene may be searched for.We combine the still image analysis with the results of the video analysis in order to retrieve shots or scenes.
The large amount of available multimedia information (e.g. videos, audio, images) requires efficient and effective annotation and retrieval methods.The System IRIS (Image Retrieval for Information Systems), which was developed at the Univers~ty of Bremen m the Al group. is designed for content-based retrieval of single images.As videos become a more important role in the frame of multimedia, we want to make videos available for IRIS [1].The basic idea to include wdeos mto the IRIS-System is to separate the whole video into shots -sequences with a sirntlar content and to create a single image for each detected shot. This image will contain the full information of the shot and can be processed with IRIS.The first part of the video analysis. This IS done by a color-histogram based method [4], where a Large difference in the histograms of two successive images indicates the beginning of a new shot.For every frame in the shot the dominant camera motion is estimated by deterring the optical flow for each pair of mccessive images m the sequence. This leads to the coordinate transformations from one image to the next one m the sequence. By appiying the appropriate transformations via a warping operation and merging the overlapping regions of the warped images, a single panoratmc mosaic image covering the entire visible area of the scene can be constructed [2] [3].The second part of the video analysis. After this process the videos are reduced to still images. A textual description can be created using IRIS.The basic concept of IRIS is, that it is more sophisticated and usual for human beings to use natural language concepts, e.g. sky, than syntactical features, e.g. b[ue region righ~up. Thts leads to a content-based image retrieval.Furthermore, it is unreasonable for any human being to make the content description for thousands of images manually. IRIS combines methods and techniques from computer vision and ArtificialIntelligence (AI) in a new way to generate content descriptions of images in a textual form automatically. The text retrieval can be done by an ordinary text-retrieval system.The two dominating goals of the IRIS-system are:q The images should be processed automatically. q The system shotdd offer a comfortable user interface with a comfortable query vocabtdary to formulate more complex queries by using concepts.To realize these goals, the IIUS-system is divided into two modules.The first main-module M divided in four submodules for image analysis, And the second main-module deals with the retrieval.There arc three modules for the low level feature extraction. The extracted features are: color, Texture, and Contour,The solor-segmentation makes use of color histograms in the HLS-color space. Second order statistical features trigger a classifying neural network to determine the texturesegmentation of the image. With a gradient-based edge detection and shape analysls, the contour-segmentation of the image is performed.Each of these sub-modules extracts segments concerning one of the features mentioned above. Th...
Archiving, processing, and intelligent retrieval of digital video sequences can be efficiently solved by a high-level representation of video sequences. We describe a complete system for encoding the contents of video sequences based on their syntactical and semantical description. Our techniques are built on two phases: analysis and synthesis. The analysis phase involves the automatic generation of a syntactical structure using image and imagesequence analysis, image understanding as well as text recognition. Moreover, the syntactical description of images will be completed by a semantical understanding of video sequences. The full representation of video sequences will be generated during the synthesis phase by combining results from the analysis phase. Our developed prototype system is planned to be integrated into the cooperative work of a TV-team of the TV-station "Radio Bremen". Additionally, we describe some special functionality: scene analysis by clustering of video sequences with the aim of identifying a typical video genre. The latter is an useful approach for automatic trailer generation and for intelligent video editing as add-on for a video-cut-system.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.