N-best entropy based data selection for acoustic modeling

Itoh, Nobuyasu; Sainath, Tara N.; Jiang, Dan; Zhou, Jie; Ramabhadran, Bhuvana

doi:10.1109/icassp.2012.6288828

Cited by 23 publications

(20 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Similarly, according to the distribution of context-dependent HMM states in a development set, Siohan [28], [29] proposed to select data for acoustic modeling. Itoh et al [27] suggested that when selecting acoustic data, the informativeness and representativeness of the data should be assessed at the same time.…”

Section: Target Language Acoustic Data Selectionmentioning

confidence: 99%

Submodular Based Unsupervised Data Selection

Zhang

2018

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

SUMMARYAutomatic speech recognition (ASR) and keyword search (KWS) have more and more found their way into our everyday lives, and their successes could boil down lots of factors. In these factors, large scale of speech data used for acoustic modeling is the key factor. However, it is difficult and time-consuming to acquire large scale of transcribed speech data for some languages, especially for low-resource languages. Thus, at low-resource condition, it becomes important with which transcribed data for acoustic modeling for improving the performance of ASR and KWS. In view of using acoustic data for acoustic modeling, there are two different ways. One is using the target language data, and another is using large scale of other source languages data for cross-lingual transfer. In this paper, we propose some approaches for efficient selecting acoustic data for acoustic modeling. For target language data, a submodular based unsupervised data selection approach is proposed. The submodular based unsupervised data selection could select more informative and representative utterances for manual transcription for acoustic modeling. For other source languages data, the high misclassified as target language based submodular multilingual data selection approach and knowledge based group multilingual data selection approach are proposed. When using selected multilingual data for multilingual deep neural network training for crosslingual transfer, it could improve the performance of ASR and KWS of target language. When comparing our proposed multilingual data selection approach with language identification based multilingual data selection approach, our proposed approach also obtains better effect. In this paper, we also analyze and compare the language factor and the acoustic factor influence on the performance of ASR and KWS. The influence of different scale of target language data on the performance of ASR and KWS at mono-lingual condition and cross-lingual condition are also compared and analyzed, and some significant conclusions can be concluded. key words: keyword spotting, submodular, multilingual data selection, language identification, recurrent neural network long short term memory

show abstract

Section: Target Language Acoustic Data Selectionmentioning

confidence: 99%

Submodular Based Unsupervised Data Selection

Zhang

2018

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

show abstract

“…[6] proposed lattice-entropy based measure and selecting utterances based on global entropy reduction. [7] observed that latticeentropy is correlated with the utterance length and showed Nbest entropy to be an empirically better criterion. In this work, we also use a entropy-based measure as informative criterion for data selection.…”

Section: Uncertainty Based Informativeness Criterionmentioning

confidence: 99%

“…The difference in cross-entropy is used a measure of relevance and the average entropy based on confusion networks is used as a measure of uncertainty or informativeness. Both the scores are in log-scale and we use a simple weighted combination to combine both the scores [7]. The final score in given by…”

Section: Score Combinationmentioning

confidence: 99%

“…It has been applied in natural language processing [2], spoken language understanding [3], speech recognition [4,5,6,7], etc. Many of the approaches relied on some form of uncertainty based measure for data selection.…”

Section: Introductionmentioning

confidence: 99%

“…Confidence scores are typically used for active learning in speech recognition [8] to predict uncertainty. Lattice [6] and N-best [7] based techniques have been proposed to avoid outliers with 1-best hypothesis. Representative criterion in addition to uncertainty have also been shown to improve data selection in some cases [9,7].…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Active learning for accent adaptation in Automatic Speech Recognition

Nallasamy

Metze

Schultz

2012

2012 IEEE Spoken Language Technology Workshop (SLT)

View full text Add to dashboard Cite

We introduce a novel active learning algorithm for speech recognition in the context of accent adaptation. We adapt a source recognizer on the target accent by selecting a matched subset of utterances from a large, untranscribed and multiaccented corpus for human transcription. Traditionally, active learning in speech recognition has relied on uncertainty based sampling to choose the most informative samples for manual labeling. Such an approach doesn't include explicit relevance criterion during data selection, which is crucial for choosing utterances to match the target accent, from datasets with wide-ranging speakers of different accents. We formulate a cross-entropy based relevance measure to complement uncertainty based sampling for active learning to aid accent adaptation. We evaluate the algorithm on two different setups for Arabic and English accents and show that our approach performs favorably to conventional data selection. We analyze the results to show the effectiveness of our approach in finding the most relevant subset of utterances for improving the speech recognizer on the target accent.

show abstract

Initial decoding with minimally augmented language model for improved lattice rescoring in low resource ASR

Murthy,

Sitaram

2024

Sādhanā

View full text Add to dashboard Cite

N-best entropy based data selection for acoustic modeling

Cited by 23 publications

References 13 publications

Submodular Based Unsupervised Data Selection

Submodular Based Unsupervised Data Selection

Active learning for accent adaptation in Automatic Speech Recognition

Initial decoding with minimally augmented language model for improved lattice rescoring in low resource ASR

Contact Info

Product

Resources

About