In this paper, we present an effective hierarchical shot classification scheme for broadcast soccer video. We first partition a video into replay and non-replay shots with replay logo detection. Then, non-replay shots are further classified into Long, Medium, Close-up or Out-field types with color and texture features based on a decision tree. We tested the method on real broadcast FIFA soccer videos, and the experimental results demonstrate its effectiveness..
IntroductionIn recent years, sports video analysis has received increasing interests due to its tremendous commercial. A popular scheme of sports video analysis has three steps: 1) parsing the video into shots; 2) classifying shots into several types with certain semantics; 3) inferring high-level events with shot transition context. In this paper, we focus on shot classification of soccer video.A lot of work has been done on shot classification in soccer video. The grass-ratio and non-field area distribution are often used as important features in most previous work [1][2][3] [12][13] to discriminate the shots of Long, Medium, and Close-up view. In [4], they used the moments of colour and shape as features to identify shot types, and modelled the shot temporal transition pattern with a Hidden Markov Model (HMM) to detect semantic events. Duan et al [5] extracted shape and motion information to categorize shots with Support Vector Machine (SVM) classifiers. Replay as a special scene of sports video, also plays an importance role in semantic event analysis. H. Pan et al [9] utilized an HMM to infer replay scenes, in which a zero-crossing measure was considered for the frequency and the amplitude of the fluctuations of adjacent frame differences. However, a single frame inside a replay must be pinpointed in advance. Later, they proposed another replay detection method based on replay-logo [14]. Kobla et al [15,16] used the macroblock, motion and bit-rate information in compressed domain to detect replay scenes. Duan et al [17] developed a logo transition detection, in which the mean shift algorithm was used to seek the mood of logo transition, and then a replay scene was determined by a pair of logo-transitions. Wang et al [18] utilized Correspondence to: qsliu@nlpr.ia.ac.cn Recommended for acceptance by Xuelong Li