6th Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU 2018) 2018
DOI: 10.21437/sltu.2018-38
|View full text |Cite
|
Sign up to set email alerts
|

ASR-Free CNN-DTW Keyword Spotting Using Multilingual Bottleneck Features for Almost Zero-Resource Languages

Abstract: We consider multilingual bottleneck features (BNFs) for nearly zero-resource keyword spotting. This forms part of a United Nations effort using keyword spotting to support humanitarian relief programmes in parts of Africa where languages are severely under-resourced. We use 1920 isolated keywords (40 types, 34 minutes) as exemplars for dynamic time warping (DTW) template matching, which is performed on a much larger body of untranscribed speech. These DTW costs are used as targets for a convolutional neural ne… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
5
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
4
3
1

Relationship

3
5

Authors

Journals

citations
Cited by 9 publications
(6 citation statements)
references
References 37 publications
1
5
0
Order By: Relevance
“…In QbE-STD task, the area under ROC curve (AUC) [18,19] is used to evaluate the accuracy of detection. The ROC curve is composed of false positive rate (FPR) and true positive rate (TPR) which are defined by:…”
Section: Evaluation Metricsmentioning
confidence: 99%
“…In QbE-STD task, the area under ROC curve (AUC) [18,19] is used to evaluate the accuracy of detection. The ROC curve is composed of false positive rate (FPR) and true positive rate (TPR) which are defined by:…”
Section: Evaluation Metricsmentioning
confidence: 99%
“…Word spotting traditionally is audio exemplar matching against spans of raw audio (Myers et al, 1980). It has been shown to be feasible in low resource scenarios using neural approaches (Menon et al, 2018b,a). Le Ferrand et al (2020 describes several plausible speech representations suited for low-resource word spotting.…”
Section: System Architecturementioning
confidence: 99%
“…A key disadvantage is that DTW is computationally very expensive and usually not feasible for large-scale continuous application (Menon et al, 2018b). Furthermore, for DTW the choice of input features has a greater impact on performance than, for example, for ASR, since in the latter case acoustic models can learn to adapt to different representations (Menon et al, 2018a).…”
Section: Introductionmentioning
confidence: 99%
“…In this way we take advantage of CNN-based searching, which is computationally efficient since it does not require alignment, to perform DTW-based matching, which requires a minimum of labelled data. We first proposed this approach, which we will refer to as CNN-DTW keyword spotting, in (Menon et al, 2018b) and later extended it in (Menon et al, 2018a) and in (Menon et al, 2019).…”
Section: Introductionmentioning
confidence: 99%