2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) 2015
DOI: 10.1109/asru.2015.7404808
|View full text |Cite
|
Sign up to set email alerts
|

Phonetically-oriented word error alignment for speech recognition error analysis in speech translation

Abstract: We propose a variation to the commonly used Word Error Rate (WER) metric for speech recognition evaluation which incorporates the alignment of phonemes, in the absence of time boundary information. After computing the Levenshtein alignment on words in the reference and hypothesis transcripts, spans of adjacent errors are converted into phonemes with word and syllable boundaries and a phonetic Levenshtein alignment is performed. The phoneme alignment information is used to correct the word alignment labels in e… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
9
0
1

Year Published

2015
2015
2023
2023

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 12 publications
(10 citation statements)
references
References 9 publications
0
9
0
1
Order By: Relevance
“…Two key features are the use of a digital speech pronouncing dictionary for automated derivation of the phonemes from stimuli and responses, and the modification of the Levenshtein minimum edit distance via dynamic programming for automated alignment of phonemes. Traditionally, speech pronouncing dictionaries have been used in speech recognition research for purposes such as aligning phonemes in speech-to-text translation ( 38 ). Here, the open source CMUDict is used for aligning phonemes in text-to-text comparison.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Two key features are the use of a digital speech pronouncing dictionary for automated derivation of the phonemes from stimuli and responses, and the modification of the Levenshtein minimum edit distance via dynamic programming for automated alignment of phonemes. Traditionally, speech pronouncing dictionaries have been used in speech recognition research for purposes such as aligning phonemes in speech-to-text translation ( 38 ). Here, the open source CMUDict is used for aligning phonemes in text-to-text comparison.…”
Section: Discussionmentioning
confidence: 99%
“…Figure 5 shows examples of alignments and phoneme F1-scores for four challenging stimulus-response pairs. The first two are examples of the consequences of insertion and deletion [see Table 4 from ( 38 )]. The third example is one of phonemic ambiguity but with different alignments caused by one substitution.…”
Section: Methodsmentioning
confidence: 99%
“…Training the Error Model: We use the seed set S to train E. Using a phone-aware edit distance algorithm as in [18], we align the phoneme representationp i of each hypothesisŷ i with the phoneme representation p i of the corresponding reference y i . Using these alignments, we obtain the error sequence e i such that e j i = 0 if the token aligned with p i at position j is the same as p j i .…”
Section: The Error Detection Modelmentioning
confidence: 99%
“…Voice over IP professionals and telecommunications professionals generally use the word IVR. Also, voice response is used sometimes [9].…”
Section: Related Workmentioning
confidence: 99%