2012
DOI: 10.1109/tasl.2011.2159595
|View full text |Cite
|
Sign up to set email alerts
|

Integrating Additional Chord Information Into HMM-Based Lyrics-to-Audio Alignment

Abstract: Abstract-Aligning lyrics to audio has a wide range of applications such as the automatic generation of karaoke scores, song-browsing by lyrics, and the generation of audio thumbnails. Existing methods are restricted to using only lyrics and match them to phoneme features extracted from the audio (usually mel-frequency cepstral coefficients). Our novel idea is to integrate the textual chord information provided in the paired chords-lyrics format known from song books and Internet sites into the inference proced… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
46
0

Year Published

2014
2014
2021
2021

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 38 publications
(46 citation statements)
references
References 18 publications
(28 reference statements)
0
46
0
Order By: Relevance
“…The baseline acoustic model (C1) is trained on solo-singing DAMP subset-train with the 40-dimensional MFCCs and 100dimensional i-vectors. To test the performance of the additional features, extracted using OpenSMILE toolbox [22], we append 2 There are a total of 105 songs in the ground-truth data, out of which the audio file links to 6 songs are not accessible from Singapore. 3 The word boundary ground-truth of the songs clocks and i kissed a girl were not accurate, hence excluded from this study the five feature groups with a total dimension of 154 to the 140dimensional baseline feature vector (C2).…”
Section: System Configurationsmentioning
confidence: 99%
See 1 more Smart Citation
“…The baseline acoustic model (C1) is trained on solo-singing DAMP subset-train with the 40-dimensional MFCCs and 100dimensional i-vectors. To test the performance of the additional features, extracted using OpenSMILE toolbox [22], we append 2 There are a total of 105 songs in the ground-truth data, out of which the audio file links to 6 songs are not accessible from Singapore. 3 The word boundary ground-truth of the songs clocks and i kissed a girl were not accurate, hence excluded from this study the five feature groups with a total dimension of 154 to the 140dimensional baseline feature vector (C2).…”
Section: System Configurationsmentioning
confidence: 99%
“…The task of lyrics-to-audio alignment is often seen as an extension of the speech-to-text alignment task. ASR systems have been used to force-align lyrics to singing vocals [1][2][3][4][5]. Singing voice, however, covers a much wider range of intrinsic variations than speech both in terms of timbre and fundamental frequencies [6].…”
Section: Introductionmentioning
confidence: 99%
“…Research into lyric synchronization targeting singing with accompaniment can be divided into two cate gories: that using no forced alignment [41][42][43] and that using forced alignment [44][45][46][47][48] . When textual chord information is additionally given, it can be used to improve the accuracy of lyric synchroniza tion [49].…”
Section: Lyric Transcription and Synchronizationmentioning
confidence: 99%
“…Perhaps due to the challenging nature of performing full transcription of the sung voice, researchers have mostly in the past concentrated on the task of aligning/synchronising lyrics to audio, where the task is to assign timestamps to a set of lyrics given the corresponding audio (see, for example, [12,[17][18][19][20]). However, there are clearly situations in which ALR is required.…”
Section: Automatic Lyric Alignment/recognitionmentioning
confidence: 99%