Interspeech 2018 2018
DOI: 10.21437/interspeech.2018-1224
|View full text |Cite
|
Sign up to set email alerts
|

Singing Voice Phoneme Segmentation by Hierarchically Inferring Syllable and Phoneme Onset Positions

Abstract: In this paper, we tackle the singing voice phoneme segmentation problem in the singing training scenario by using languageindependent information -onset and prior coarse duration. We propose a two-step method. In the first step, we jointly calculate the syllable and phoneme onset detection functions (ODFs) using a convolutional neural network (CNN). In the second step, the syllable and phoneme boundaries and labels are inferred hierarchically by using a duration-informed hidden Markov model (HMM). To achieve t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 22 publications
(32 reference statements)
0
1
0
Order By: Relevance
“…While great progress has been made regarding lyrics alignment at word level using resource intensive methods [7], [8], phoneme level alignment is rarely addressed although the methods in [7], [8] could be adapted to it. In fact, when phoneme alignment is required, they are often aligned manually [9], [10] or tools such as [11] are used [8], [12], [13] which employ acoustic models based on Gaussian Mixture Model -Hidden Markov Models (GMM-HMM) and do not work well on mixed singing voice as will be shown in Section V-B1.…”
Section: Introductionmentioning
confidence: 99%
“…While great progress has been made regarding lyrics alignment at word level using resource intensive methods [7], [8], phoneme level alignment is rarely addressed although the methods in [7], [8] could be adapted to it. In fact, when phoneme alignment is required, they are often aligned manually [9], [10] or tools such as [11] are used [8], [12], [13] which employ acoustic models based on Gaussian Mixture Model -Hidden Markov Models (GMM-HMM) and do not work well on mixed singing voice as will be shown in Section V-B1.…”
Section: Introductionmentioning
confidence: 99%