Interspeech 2016 2016
DOI: 10.21437/interspeech.2016-472
|View full text |Cite
|
Sign up to set email alerts
|

Automatic Genre and Show Identification of Broadcast Media

Abstract: Huge amounts of digital videos are being produced and broadcast every day, leading to giant media archives. Effective techniques are needed to make such data accessible further. Automatic meta-data labelling of broadcast media is an essential task for multimedia indexing, where it is standard to use multi-modal input for such purposes. This paper describes a novel method for automatic detection of media genre and show identities using acoustic features, textual features or a combination thereof. Furthermore th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
8
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 10 publications
(8 citation statements)
references
References 18 publications
0
8
0
Order By: Relevance
“…Pre-training For feature based DNN adaptive training, the RNN-LDA or LDA cluster labels can be used as appending features as mentioned in section 2.2. However, [24,21] showed that acoustic condition variations can be represented by distribution of these clusters. Hence, using the cluster log posterior probabilities…”
Section: Lstm Lstm Lstmmentioning
confidence: 99%
See 2 more Smart Citations
“…Pre-training For feature based DNN adaptive training, the RNN-LDA or LDA cluster labels can be used as appending features as mentioned in section 2.2. However, [24,21] showed that acoustic condition variations can be represented by distribution of these clusters. Hence, using the cluster log posterior probabilities…”
Section: Lstm Lstm Lstmmentioning
confidence: 99%
“…During training, per-utterance clusters are discovered by a LDA model, and utilized as latent acoustic condition indicator features for DNN adaptive training. Hence, acoustic conditions are not necessary to provide explicitly, and the variation can be represented by distribution of these LDA clusters [21,24] implicitly. However, LDA clustering normally treats the data as bags of discrete features, thus ignores the transition and correlation between neighboring clusters.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…To further improve a realtime system performance, an Automatic Speech Recognition (ASR) model was used in [9] to detect key words that enriches the word vectors representing a programme. Recently, Mortaza et al [10] provided a comprehensive analysis of main features such as audio, text and the other metadata (channel, time) that were used for BBC broadcast classification. From the existing literature of audio feature extraction in broadcast classification [8], [10]- [12], we can see that these methods followed two main steps: Acoustic modeling on short-time segments and statistical modeling across the programme.…”
Section: Introductionmentioning
confidence: 99%
“…Recently, Mortaza et al [10] provided a comprehensive analysis of main features such as audio, text and the other metadata (channel, time) that were used for BBC broadcast classification. From the existing literature of audio feature extraction in broadcast classification [8], [10]- [12], we can see that these methods followed two main steps: Acoustic modeling on short-time segments and statistical modeling across the programme. For example, Ekenel et al [8] firstly extracted Mel-frequency cepstral coefficients, fundamental frequency, signal energy, zero crossing rate from short-time segments split from each programme.…”
Section: Introductionmentioning
confidence: 99%