&lt;title&gt;Video segmentation using 3D hints contained in 2D images&lt;/title&gt;

Fard, Mohsen Ardebilian; Tu, Xiaowei; Chen, Liming; Faudemay, Pascal

doi:10.1117/12.257293

Cited by 11 publications

(5 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However there are still some difficulties in special cases such as flash lights, some special effects, etc. Within the Transdoc project, Ardebilian et al have improved shot detection by using 3D indices, such as focus of expansion, which are very insensitive to intensity [20].…”

Section: Segmentation Issuesmentioning

confidence: 98%

“…This is very important for the video indexing application, where segment size is usually small. However in our study of the movie "Un indien dans la ville"(Faudemay et al 1996) we have found that about 100 speech segments have a sufficient duration for speaker recognition (more than 2 seconds) , in a 1 5 minutes interval of the movie. We have studied the recognition ratio of our approach, on two well known speech databases.…”

mentioning

confidence: 97%

See 1 more Smart Citation

<title>Video indexing based on image and sound</title>

1997

Self Cite

View full text Add to dashboard Cite

Video indexing is a major challenge for both scientific and economic reasons. Information extraction can sometimes be easier from sound channel than from image channel.We first present a multi-channel and multi-modal query interface, to query sound, image and script through "pull" and "push" queries. We then summarize the segmentation phase, which needs information from the image channel. Detection of critical segments is proposed. It should speed-up both automatic and manual indexing. We then present an overview of the information extraction phase. Information can be extracted from the sound channel, through speaker recognition, vocal dictation with unconstrained vocabularies, and script alignment with speech (or "script warping"). We present experiment results for these various techniques. Speaker recognition methods were tested on the TIMIT and NTIMIT database. Vocal dictation was experimented on newspaper sentences spoken by several speakers. Script alignment was tested on part of a cartoon movie, "Ivanhoe".For good quality sound segments, error rates are low enough for use in indexing applications. Major issues are the processing of sound segments with noise or music, and performance improvement through the use of appropriate, low-cost parallel architectures or networks of workstations.

show abstract

Section: Segmentation Issuesmentioning

confidence: 98%

mentioning

confidence: 97%

<title>Video indexing based on image and sound</title>

1997

Self Cite

View full text Add to dashboard Cite

show abstract

“…They are based on 3D hints contained in 2D images and dynamic thresholding [9,10]. As compared to other techniques based on global visual features, such as a global color histogram, the experimental results at a real scale show that our technique is more accurate and gives more reliable behavior with respect to annoying effects such as speed camera movements.…”

Section: Introductionmentioning

confidence: 94%

Robust 3D Clue-Based Video Segmentation for Video Indexing

Ardebilian

Chen

2000

Journal of Visual Communication and Image Representation

View full text Add to dashboard Cite

“…The first assumption will be relaxed in Section III-D, and the second one in Section III-E. 2 All running times in this paper are reported for a 3GHz PC with 1GB of RAM.…”

Section: B Affine Projection Constraintsmentioning

confidence: 99%

“…The resulting 3D models represent the structural content of the scene, and they can be compared and matched using techniques similar to those in [45]. This is useful for shot matching, i.e., recognizing shots of the same scene [1], [2], [4], [46], [51], [61] -a fundamental task in video retrieval.…”

Section: Introductionmentioning

confidence: 99%

Segmenting, modeling, and matching video clips containing multiple moving objects

Rothganger

Lazebnik

Schmid

et al.

Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004.

View full text Add to dashboard Cite

This paper presents a novel representation for dynamic scenes composed of multiple rigid objects that may undergo different motions and are observed by a moving camera. Multi-view constraints associated with groups of affine-covariant scene patches and a normalized description of their appearance are used to segment a scene into its rigid components, construct three-dimensional models of these components, and match instances of models recovered from different image sequences. The proposed approach has been implemented, and it is applied to the detection and matching of moving objects in video sequences and to shot matching, i.e., the identification of shots that depict the same scene in a video clip.

show abstract

<title>Video segmentation using 3D hints contained in 2D images</title>

Cited by 11 publications

References 0 publications

<title>Video indexing based on image and sound</title>

<title>Video indexing based on image and sound</title>

Robust 3D Clue-Based Video Segmentation for Video Indexing

Segmenting, modeling, and matching video clips containing multiple moving objects

Contact Info

Product

Resources

About