Proceedings of the International Workshop on TRECVID Video Summarization 2007
DOI: 10.1145/1290031.1290045
|View full text |Cite
|
Sign up to set email alerts
|

NTU TRECVID-2007 fast rushes summarization system

Abstract: Rushes are the raw materials used to produce a video. They often contain redundant and repetitive contents. Rushes summarization aims to provide a quick overview for a rushes video. As part of TRECVID 2007, NIST initiates a rushes summarization task. This paper reports on the design of NTU rushes summarization system for this task. Our system consists of three components, shot segmentation, redundant shot detection and summary creation. To tackle the bulky rushes, we focus on efficient but effective feature re… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
8
0
4

Year Published

2010
2010
2019
2019

Publication Types

Select...
3
3

Relationship

0
6

Authors

Journals

citations
Cited by 16 publications
(12 citation statements)
references
References 5 publications
0
8
0
4
Order By: Relevance
“…Soccer events can be detected by using temporal logic models [23] or goalmouth detection [24]. Much attention has been paid to rush video summarization [25]- [27]. Rush videos often contain redundant and repetitive contents, by exploring which a concise summary can be generated.…”
Section: Related Workmentioning
confidence: 99%
“…Soccer events can be detected by using temporal logic models [23] or goalmouth detection [24]. Much attention has been paid to rush video summarization [25]- [27]. Rush videos often contain redundant and repetitive contents, by exploring which a concise summary can be generated.…”
Section: Related Workmentioning
confidence: 99%
“…-A straightforward approach (denoted S HSV in the sequel) which assesses the similarity between subsequent video frames with the help of HSV histograms and x 2 distance, and a variation of it (denoted S DCT) that represents the visual content of the video frames using DCT features and estimates their visual resemblance based on the cosine similarity. -A method (denoted B HSV) similar to [26], that selects the first frame of the video F a as the base frame and compares it sequentially with the following ones using HSV histograms and x 2 distance until some frame F b is different enough, then frames between F a and F b form a sub-shot, and F b is used as the next base frame in a process that is repeated until all frames of the video have been processed; a variation of this approach (denoted B DCT) that represents the visual content of the video frames using DCT features and estimates their visual resemblance based on the cosine similarity was also implemented. -The algorithm of [8] (denoted A SIFT), which estimates the dominant motion between a pair of frames based on the computed parameters of a 3 × 3 affine model through the extraction and matching of SIFT descriptors; furthermore, variations of this approach that rely on the use of SURF (denoted A SURF) and ORB [27] (denoted A ORB) descriptors were also implemented for assessing the efficiency of faster alternatives to SIFT.…”
Section: Experiments and Resultsmentioning
confidence: 99%
“…Based on this assumption, they try to define sub-shots by assessing the visual similarity of consecutive or neighbouring video frames. A rather straightforward approach that evaluates frames' similarity using colour histograms and the x 2 test was described in [26], while a method that detects sub-shots of a video by assessing the visual dissimilarity of frames lying within a sliding temporal window using 16-bin HSV histograms (denoted as "Eurecom segmentation") was reported in [11]. A different approach [3] estimates the gridlevel dissimilarity between pairs of frames and segments a video by observing that the cumulative difference in the visual content of subsequent frames indicates gradual change within a sub-shot; a similar approach was presented in [20].…”
Section: Related Workmentioning
confidence: 99%
“…[28]), while there is a group of methods that targeted the indexing and summarization of rushes video (e.g. [12,25,4,36]). The majority of the suggested approaches can be grouped in two main classes of methodologies.…”
Section: Video Fragmentationmentioning
confidence: 99%
“…Based on this assumption, they try to define sub-shots by assessing the visual similarity of consecutive or neighboring video frames. A rather straightforward approach that evaluates frames' similarity using colour histograms and the x 2 test was described in [36], while a method that detects sub-shots of a video by assessing the visual dissimilarity of frames lying within a sliding temporal window using 16-bin HSV histograms (denoted as "Eurecom fragmentation") was reported in [12]. Instead of using HSV histograms, the video fragmentation and keyframe selection approach described in [39], represents the visual content of each video frame with the help of the Discrete Cosine Transform (DCT) and assesses the visual similarity of neighboring video frames based on the cosine similarity.…”
Section: Video Fragmentationmentioning
confidence: 99%