Interspeech 2018 2018
DOI: 10.21437/interspeech.2018-1085
|View full text |Cite
|
Sign up to set email alerts
|

Automatic Speech Recognition System Development in the "Wild"

Abstract: The standard framework for developing an automatic speech recognition (ASR) system is to generate training and development data for building the system, and evaluation data for the final performance analysis. All the data is assumed to come from the domain of interest. Though this framework is matched to some tasks, it is more challenging for systems that are required to operate over broad domains, or where the ability to collect the required data is limited. This paper discusses ASR work performed under the I… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
16
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 20 publications
(17 citation statements)
references
References 22 publications
1
16
0
Order By: Relevance
“…For these results, no wideband adaptation was used, so the wideband data was downsampled to use the same features as the CTS data. Our results compare favourably to previously published results on the same dataset [21,23].…”
Section: Methodssupporting
confidence: 90%
“…For these results, no wideband adaptation was used, so the wideband data was downsampled to use the same features as the CTS data. Our results compare favourably to previously published results on the same dataset [21,23].…”
Section: Methodssupporting
confidence: 90%
“…The original work was demonstrated on in-domain data-sets (conversational speech). Nevertheless, similar techniques have also been employed to use a seed model trained on out-of-domain data followed by SSL on in-domain data [1,17,18]. The latter is often obtained from data crawled from the web.…”
Section: Semi-supervised Training Using Lfmmimentioning
confidence: 99%
“…A total of 400 hours each for Lithuanian and Bulgarian are considered for the experiments. The results are reported on two sets: the dev set, which is part of the official Babel release, and the IARPA MATERIAL Analysis Pack 1 (Analysis) [17,22]. The dev set consists of only CS while the Analysis set contains the three domains: CS, NB and TB.…”
Section: Materials Data-set Setupmentioning
confidence: 99%
“…Reference [12] introduces a crawler for YouTube to curate training dataset for ASR and demonstrates a 40% improvement in Word Error Rate (WER) on the Wall Street Journal test dataset. In [13], the authors address the problem of operating ASRs in a wide range of developing languages, such as Swahili, by proposing to automatically scrape audio from YouTube and Voice of America and use ASR system confidence scores as the primary metric for the model components. The creators of the VoxCeleb1 and VoxCeleb2 datasets [14], [15], crawled YouTube to construct the datasets, which are now widely used in the field of Speaker Recognition [16].…”
Section: Related Workmentioning
confidence: 99%