2015 23rd European Signal Processing Conference (EUSIPCO) 2015
DOI: 10.1109/eusipco.2015.7362592
|View full text |Cite
|
Sign up to set email alerts
|

Keyword spotting in singing with duration-modeled HMMs

Abstract: Keyword spotting in speech is a very well-researched problem, but there are almost no approaches for singing. Most speech-based approaches cannot be applied easily to singing because the phoneme durations in singing vary a lot more than in speech, especially the vowel durations. To represent expected phoneme durations, several duration modeling techniques have been developed over the years in the field of ASR. To the best of our knowledge, these approaches have not been used for keyword spotting yet. In this p… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
10
2

Year Published

2016
2016
2023
2023

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(12 citation statements)
references
References 19 publications
0
10
2
Order By: Relevance
“…The second topic related to our research is the singing voice lyrics-to-audio alignment. Most of these works [11,12,13,14,15,16,17,18] used the forced alignment method accompanied by music-related techniques. Loscos et al [12] used MFCCs with additional features and also explored specific HMM topologies.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…The second topic related to our research is the singing voice lyrics-to-audio alignment. Most of these works [11,12,13,14,15,16,17,18] used the forced alignment method accompanied by music-related techniques. Loscos et al [12] used MFCCs with additional features and also explored specific HMM topologies.…”
Section: Related Workmentioning
confidence: 99%
“…Iskandar et al [15] constrained the alignment by using musical note length distribution. Gong et al [16], Kruspe [17], Dzhambazov and Serra [18] all used sylla-ble/phoneme duration extracted from the musical score and decoded the alignment path by duration-explicit HMM models. Chien et al [19] introduced an approach based on vowel likelihood models.…”
Section: Related Workmentioning
confidence: 99%
“…Because of the implicity of the Markovian state occupancy, the phonetic duration distribution introduced in section 3.3 can not be imposed. Kruspe [12] presents two duration modeling techniques for HMMs: Hidden semi-markov model (HSMM) and post-processor duration model.…”
Section: Duration Modelingmentioning
confidence: 99%
“…The post-processor duration model was first introduced by Juang et al [10]. It was then experimentally proved in Kruspe's paper [12] that this duration model works better than HSMMs for the keyword spotting task in English pop singing voice. The post-processor duration model uses the original HMMs Viterbi algorithm -therefore, during the decoding process no explicit occupancy duration distribution is imposed.…”
Section: Post-processor Duration Modelmentioning
confidence: 99%
See 1 more Smart Citation