Proceedings of the 2nd ACM Symposium on Computing for Development 2012
DOI: 10.1145/2160601.2160618
|View full text |Cite
|
Sign up to set email alerts
|

Discriminative pronunciation learning for speech recognition for resource scarce languages

Abstract: In this paper, we describe a method to create speech recognition capability for small vocabularies in resource-scarce languages. By resource-scarce languages, we mean languages that have a small or economically disadvantaged user base which are typically ignored by the commercial world. We use a high-quality welltrained speech recognizer as our baseline to remove the dependence on large audio data for an accurate acoustic model. Using cross-language phoneme mapping, the baseline recognizer effectively recogniz… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
9
0

Year Published

2013
2013
2018
2018

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 13 publications
(9 citation statements)
references
References 5 publications
0
9
0
Order By: Relevance
“…Our work builds directly on previous work, namely the Speechbased Automated Learning of Accent and Articulation Mapping (SALAAM) [12,11] algorithm, as implemented in the open source tool Lex4All [14,13].…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations
“…Our work builds directly on previous work, namely the Speechbased Automated Learning of Accent and Articulation Mapping (SALAAM) [12,11] algorithm, as implemented in the open source tool Lex4All [14,13].…”
Section: Related Workmentioning
confidence: 99%
“…The primary idea behind the SALAAM technique is to find the best pronunciation sequence for a given word in a target language from one or more audio samples by using a source language speech recognizer to perform phone decoding (decoding by phoneme) [10]. Since most commercial speech recognizers do not directly support phone decoding, the SALAAM technique uses a specially-designed grammar to mimic phonedecoding [10,13]. This is achieved by creating a recognition grammar representing a phoneme super wildcard to guide pronunciation discovery.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…If the frequency range of an input speech signal is from 0 to f, then the frequencies above (f÷2)are cut off by the low-pass filter, while the frequencies below (f÷2)are cut off by the high-pass filter. Filtering a speech signal is equivalent to the convolution mathematical operation of the signal [22][23] by the filter's impulse response (see Eq. (10) and Eq.…”
Section: Proposed Technique For Feature Vector Extractionmentioning
confidence: 99%