Automatic Genre and Show Identification of Broadcast Media

Doulaty, Mortaza; Saz, Óscar; Ng, Raymond W. M.; Hain, Thomas

doi:10.21437/interspeech.2016-472

Cited by 10 publications

(8 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Pre-training For feature based DNN adaptive training, the RNN-LDA or LDA cluster labels can be used as appending features as mentioned in section 2.2. However, [24,21] showed that acoustic condition variations can be represented by distribution of these clusters. Hence, using the cluster log posterior probabilities…”

Section: Lstm Lstm Lstmmentioning

confidence: 99%

“…During training, per-utterance clusters are discovered by a LDA model, and utilized as latent acoustic condition indicator features for DNN adaptive training. Hence, acoustic conditions are not necessary to provide explicitly, and the variation can be represented by distribution of these LDA clusters [21,24] implicitly. However, LDA clustering normally treats the data as bags of discrete features, thus ignores the transition and correlation between neighboring clusters.…”

Section: Introductionmentioning

confidence: 99%

“…which can be used to represent the latent acoustic conditions of speech from different domains [24,21].…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

RNN-LDA Clustering for Feature Based DNN Adaptation

Xie¹,

Liu²,

Lee³

et al. 2017

Interspeech 2017

View full text Add to dashboard Cite

Model based deep neural network (DNN) adaptation approaches often require multi-pass decoding in test time. Input feature based DNN adaptation, for example, based on latent Dirichlet allocation (LDA) clustering, provide a more efficient alternative. In conventional LDA clustering, the transition and correlation between neighboring clusters is ignored. In order to address this issue, a recurrent neural network (RNN) based clustering scheme is proposed to learn both the standard LDA cluster labels and their natural correlation over time in this paper. In addition to directly using the resulting RNN-LDA as input features during DNN adaptation, a range of techniques were investigated to condition the DNN hidden layer parameters or activation outputs on the RNN-LDA features. On a DARPA Gale Mandarin Chinese broadcast speech transcription task, the proposed RNN-LDA cluster features adapted DNN system outperformed both the baseline un-adapted DNN system and conventional LDA features adapted DNN system by 8% relative on the most difficult Phoenix TV subset. Consistent improvements were also obtained after further combination with model based adaptation approaches.

show abstract

Section: Lstm Lstm Lstmmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

RNN-LDA Clustering for Feature Based DNN Adaptation

Xie¹,

Liu²,

Lee³

et al. 2017

Interspeech 2017

View full text Add to dashboard Cite

show abstract

“…To further improve a realtime system performance, an Automatic Speech Recognition (ASR) model was used in [9] to detect key words that enriches the word vectors representing a programme. Recently, Mortaza et al [10] provided a comprehensive analysis of main features such as audio, text and the other metadata (channel, time) that were used for BBC broadcast classification. From the existing literature of audio feature extraction in broadcast classification [8], [10]- [12], we can see that these methods followed two main steps: Acoustic modeling on short-time segments and statistical modeling across the programme.…”

Section: Introductionmentioning

confidence: 99%

“…Recently, Mortaza et al [10] provided a comprehensive analysis of main features such as audio, text and the other metadata (channel, time) that were used for BBC broadcast classification. From the existing literature of audio feature extraction in broadcast classification [8], [10]- [12], we can see that these methods followed two main steps: Acoustic modeling on short-time segments and statistical modeling across the programme. For example, Ekenel et al [8] firstly extracted Mel-frequency cepstral coefficients, fundamental frequency, signal energy, zero crossing rate from short-time segments split from each programme.…”

Section: Introductionmentioning

confidence: 99%

An Audio-Based Deep Learning Framework For BBC Television Programme Classification

Pham¹,

Baume²,

Kong³

et al. 2021

Preprint

View full text Add to dashboard Cite

This paper proposes a deep learning framework for classification of BBC television programmes using audio. The audio is firstly transformed into spectrograms, which are fed into a pre-trained Convolutional Neural Network (CNN), obtaining predicted probabilities of sound events occurring in the audio recording. Statistics for the predicted probabilities and detected sound events are then calculated to extract discriminative features representing the television programmes. Finally, the embedded features extracted are fed into a classifier for classifying the programmes into different genres. Our experiments are conducted over a dataset of 6,160 programmes belonging to nine genres labelled by the BBC. We achieve an average classification accuracy of 93.7% over 14-fold cross validation. This demonstrates the efficacy of the proposed framework for the task of audiobased classification of television programmes.

show abstract