Interspeech 2017 2017
DOI: 10.21437/interspeech.2017-792
|View full text |Cite
|
Sign up to set email alerts
|

Enhanced Feature Extraction for Speech Detection in Media Audio

Abstract: Speech detection is an important first step for audio analysis on media contents, whose goal is to discriminate the presence of speech from non-speech. It remains a challenge owing to various sound sources included in media audio. In this work, we present a novel audio feature extraction method to reflect the acoustic characteristic of the media audio in the timefrequency domain. Since the degree of combination of harmonic and percussive components varies depending on the type of sound source, the audio featur… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
6
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 8 publications
(6 citation statements)
references
References 16 publications
0
6
0
Order By: Relevance
“…The process is divided into voice signal preprocessing and voice feature extraction. The four steps involved in voice signal preprocessing, (16)(17)(18) digital sampling, pre-emphasis, framing, and windowing, are shown in Fig. 2.…”
Section: Extraction Of Voice Featuresmentioning
confidence: 99%
“…The process is divided into voice signal preprocessing and voice feature extraction. The four steps involved in voice signal preprocessing, (16)(17)(18) digital sampling, pre-emphasis, framing, and windowing, are shown in Fig. 2.…”
Section: Extraction Of Voice Featuresmentioning
confidence: 99%
“…To tackle speech signals corrupted by noise, in this field, some of previous studies [4,5] tended to recover original signals by removing noise. Some methods [6,7] focused on feature extraction from un-corrupted voices, and some methods [8,9] tried to estimated speech quality by computing signal-to-noise ratio (SNR). Although speech enhancement has been used for speaker recognition, in most of previous studies it was often processed individually.…”
Section: Introductionmentioning
confidence: 99%
“…Some of previous studies [4,5] tended to recover original signals by removing noise. Some methods [6,7] focused on feature extraction from un-corrupted speech signals, and some methods [8,9] tried to estimated speech quality by computing signal-to-noise ratio (SNR).…”
Section: Introductionmentioning
confidence: 99%