2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2015
DOI: 10.1109/icassp.2015.7178996
|View full text |Cite
|
Sign up to set email alerts
|

Low-resource keyword search strategies for tamil

Abstract: We propose strategies for a state-of-the-art keyword search (KWS) system developed by the SINGA team in the context of the 2014 NIST Open Keyword Search Evaluation (OpenKWS14) using conversational Tamil provided by the IARPA Babel program. To tackle low-resource challenges and the rich morphological nature of Tamil, we present highlights of our current KWS system, including: (1) Submodular optimization data selection to maximize acoustic diversity through Gaussian component indexed N-grams; (2) Keywordaware la… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
14
0

Year Published

2015
2015
2022
2022

Publication Types

Select...
6
2

Relationship

2
6

Authors

Journals

citations
Cited by 32 publications
(14 citation statements)
references
References 24 publications
0
14
0
Order By: Relevance
“…We built a morpheme-based KWS system since morpheme was shown to be effective for detecting OOV keywords [5,27,28]. We adopted the Morfessor toolkit [29] to segment both wordbased dictionary and word transcriptions into morpheme units.…”
Section: Abstractearch Systemmentioning
confidence: 99%
See 1 more Smart Citation
“…We built a morpheme-based KWS system since morpheme was shown to be effective for detecting OOV keywords [5,27,28]. We adopted the Morfessor toolkit [29] to segment both wordbased dictionary and word transcriptions into morpheme units.…”
Section: Abstractearch Systemmentioning
confidence: 99%
“…This work focuses on the Spoken Term Detection (STD) [1] or Keyword Search (KWS) [2] task, which detects the present of a textual keyword in an audio corpus. Generally, a two-stage approach is utilized for a KWS system [3][4][5][6]. Specifically, audio files of the corpus are first segmented and transcribed into lattices by an automatic speech recognition (ASR).…”
Section: Introductionmentioning
confidence: 99%
“…Best performance in the NIST Open KWS 2013 evaluation is ATWV=0.6248 [110] under the Full Language Pack (FullLP) condition, for which 20 h of word-transcribed scripted speech, 80 h of word-transcribed CTS, and a pronunciation lexicon were given to participants. In the works describing systems on the surprise language (i.e., Tamil) of the Open KWS 2014 evaluation [53,92,94,[111][112][113][114][115][116][117], ATWV=0.5802 is the best performance obtained under the FullLP condition, for which 60 h of transcribed speech and a pronunciation lexicon were given to participants.…”
Section: Comparison To Other Evaluationsmentioning
confidence: 99%
“…Other subword units, namely morphemes, are effective for morphological rich languages such as Turkish or Tamil [122]. Morphemes are smallest grammatical units, i.e.…”
Section: Data-driven Subword Unitsmentioning
confidence: 99%
“…Various types of subwords have been studied, including linguistic units such as phones [57,58,60], syllables [62,63,138], morphemes [65,122], characters [62]; and data driven units such as word-fragments [117], multigrams [94], graphones [107] and particles [101].…”
Section: Introductionmentioning
confidence: 99%