Proceedings of the 7th Nordic Signal Processing Symposium - NORSIG 2006 2006
DOI: 10.1109/norsig.2006.275206
|View full text |Cite
|
Sign up to set email alerts
|

Log Likelihood Ratio Based Annotation Verification of a Norwegian Speech Synthesis Database

Abstract: Accurate labeling and segmentation of the unit inventory database is of vital importance to the quality of unit selection text-to-speech synthesis. Misalignments and mismatch between the predicted and pronounced unit sequences require manual correction to achieve natural sounding synthesis. In this paper we have used a log likelihood ratio based utterance verification to automatically detect annotation errors in a Norwegian two-speaker synthesis database. Each sentence is assigned a confidence score and those … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2008
2008
2023
2023

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 4 publications
0
2
0
Order By: Relevance
“…The set contains approximately 2600 sentence-like segments, all with manual reference texts from their respective corpora. We use the free speech test set of NB Tale [19], which contains spontaneous narrative monologues, and a part of the test set of Rundkast [20], which includes broadcast radio news shows. Lastly, we take the test set from the Norwegian Parliamentary Speech Corpus (NPSC) [21].…”
Section: Survey On Human Perception Of Asr Transcription Qualitymentioning
confidence: 99%
“…The set contains approximately 2600 sentence-like segments, all with manual reference texts from their respective corpora. We use the free speech test set of NB Tale [19], which contains spontaneous narrative monologues, and a part of the test set of Rundkast [20], which includes broadcast radio news shows. Lastly, we take the test set from the Norwegian Parliamentary Speech Corpus (NPSC) [21].…”
Section: Survey On Human Perception Of Asr Transcription Qualitymentioning
confidence: 99%
“…In this paper, we proposed to locate the heteronym annotation errors from the annotation verification view, and use log likelihood ratio [2] of various features to locate the heteronym annotation errors in a Mandarin speech synthesis database. For that Mandarin is a tonal language, and heteronym can be pronounced differently either in the phone itself or just in tone, we divide the heteronyms into 2 categories: class A is the heteronym pronounced differently just in tone, and class B is the ones pronounced different in phone and either the same or different in tone.…”
Section: Introductionmentioning
confidence: 99%