2005 IEEE International Conference on Multimedia and Expo
DOI: 10.1109/icme.2005.1521485
|View full text |Cite
|
Sign up to set email alerts
|

Separation of Voice and Music by Harmonic Structure Stability Analysis

Abstract: Separation of voice and music is an interesting but difficult problem. It is useful for many other researches such as audio content analysis. In this paper, the difference between voice and music signals is carefully studied. It is proposed that the Harmonic Structure Stability is the key difference between them. A separation algorithm based on this theory is proposed. The main idea is to learn the average harmonic structure of the music, and then separate signals by using it to distinguish voice and music har… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(2 citation statements)
references
References 8 publications
0
2
0
Order By: Relevance
“…Recently, some scholars take use of spectrum matching and spectrum filter technology for music source separation [2] , Computational Auditory Scene Analysis (CASA) restructures the audio stream with the same psychological acoustic characteristics audio sense objects to realize music mix separation [3,4] . Methods based on sinusoidal model extract sinusoidal tracks from some T-F representations of the signal, and then apply grouping rules to assign these tracks to different sources [5,6,7] .…”
Section: Introductionmentioning
confidence: 99%
“…Recently, some scholars take use of spectrum matching and spectrum filter technology for music source separation [2] , Computational Auditory Scene Analysis (CASA) restructures the audio stream with the same psychological acoustic characteristics audio sense objects to realize music mix separation [3,4] . Methods based on sinusoidal model extract sinusoidal tracks from some T-F representations of the signal, and then apply grouping rules to assign these tracks to different sources [5,6,7] .…”
Section: Introductionmentioning
confidence: 99%
“…Extensions exist on BSS for speaker separation, which are not bounded by this independent number of observations constraint. These solutions typically rely on constraints inspired on the characteristics of voice, such as the regular harmonic structure of voiced phonemes (Zhang and Zhang 2006), a priori knowledge on sound models (Kristjansson et al 2004, Potamitis and Ozerov 2008, Jang and Lee 2004 or by assuming that fundamental frequencies do not overlap (Barry et al 2005). In the context of HS processing these approaches have some disadvantages.…”
Section: Introductionmentioning
confidence: 99%