“…Much research work has attempted to automatically classify an entire video clip into one of several categories, such as sports, news, cartoon, music. In general, the previous methods can be categorized into four types: text-based approaches [1,19], audio feature based approaches [11,12,13,14], visual feature based approaches [5,16,18], and those that used some combination of text, audio and visual features [4,5,8]. In fact, most authors incorporated audio and visual features into their approaches (we call it contentbased approaches); therefore in general, most approaches employ more than one modality.…”