Interspeech 2016 2016
DOI: 10.21437/interspeech.2016-540
|View full text |Cite
|
Sign up to set email alerts
|

Robust Multichannel Gender Classification from Speech in Movie Audio

Abstract: Speech in the form of scripted dialogues forms an important part of the audio signal in movies. However, it is often masked by background audio signals such as music, ambient noise or background chatter. These background sounds make even otherwise simple tasks, such as gender classification, challenging. Additionally, the variability in this noise across movies renders standard approaches to source separation or enhancement inadequate. Instead, we exploit multichannel information present in different language … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
10
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(10 citation statements)
references
References 13 publications
0
10
0
Order By: Relevance
“…We performed a comprehensive evaluation of our system by first evaluating the BLSTM-VAD that was trained with movie audio forced-aligned with the subtitles using Gentle. We compared the performance of our VAD to that of the state-of-the-art for movie data, and also against openSMILE VAD, which is used in [5]. We then evaluated all gender identification models with, a) ground-truth (or oracle) VAD [18], and b) the proposed BLSTM-VAD (see Sec 4).…”
Section: Experiments and Resultsmentioning
confidence: 99%
See 4 more Smart Citations
“…We performed a comprehensive evaluation of our system by first evaluating the BLSTM-VAD that was trained with movie audio forced-aligned with the subtitles using Gentle. We compared the performance of our VAD to that of the state-of-the-art for movie data, and also against openSMILE VAD, which is used in [5]. We then evaluated all gender identification models with, a) ground-truth (or oracle) VAD [18], and b) the proposed BLSTM-VAD (see Sec 4).…”
Section: Experiments and Resultsmentioning
confidence: 99%
“…The dataset was partitioned into 82 movies for training and 13 movies for system development. The development split was chosen to have a sample representative of different movie genres [5].…”
Section: Movie Dataset For Vad Trainingmentioning
confidence: 99%
See 3 more Smart Citations