2014 IEEE International Congress on Big Data 2014
DOI: 10.1109/bigdata.congress.2014.138
|View full text |Cite
|
Sign up to set email alerts
|

Speech and Singing Discrimination for Audio Data Indexing

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 13 publications
0
2
0
Order By: Relevance
“…Some works combine short-term features related with the spectral envelope and long-term features related with prosody to train Gaussian Mixture Models (GMM) and distinguish speech and singing. In [1], the authors found that short-term features work better for segments shorter than 1 s and pitch related features obtain the best results for segments longer than 1 s. On the contrary, the work in [13] finds that spectral features work better than pitch based ones when used alone in their database composed of segments between 17 and 26 s. Regardless, they obtained the best results when combining both types of features. In [14], a large set of 276 attributes related with spectral envelope, pitch, harmonic to noise ratio and other characteristics and a ensemble of classifiers are proposed to classify singing voice, speech and polyphonic music and get very good results.…”
Section: Introductionmentioning
confidence: 99%
“…Some works combine short-term features related with the spectral envelope and long-term features related with prosody to train Gaussian Mixture Models (GMM) and distinguish speech and singing. In [1], the authors found that short-term features work better for segments shorter than 1 s and pitch related features obtain the best results for segments longer than 1 s. On the contrary, the work in [13] finds that spectral features work better than pitch based ones when used alone in their database composed of segments between 17 and 26 s. Regardless, they obtained the best results when combining both types of features. In [14], a large set of 276 attributes related with spectral envelope, pitch, harmonic to noise ratio and other characteristics and a ensemble of classifiers are proposed to classify singing voice, speech and polyphonic music and get very good results.…”
Section: Introductionmentioning
confidence: 99%
“…The second approach uses long-term features for classification. In previous works, distribution of pitch values [9], pitch parameters [10] and note grammars [11], [12] are used. These long term features can be calculated for the whole file or for a window that slides through the audio file.…”
Section: Introductionmentioning
confidence: 99%