2021
DOI: 10.48550/arxiv.2110.09264
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Intent Classification Using Pre-trained Language Agnostic Embeddings For Low Resource Languages

Abstract: Building Spoken Language Understanding (SLU) systems that do not rely on language specific Automatic Speech Recognition (ASR) is an important yet less explored problem in language processing. In this paper, we present a comparative study aimed at employing a pre-trained acoustic model to perform SLU in low resource scenarios. Specifically, we use three different embeddings extracted using Allosaurus, a pre-trained universal phone decoder: (1) Phone (2) Panphone, and (3) Allo embeddings. These embeddings are th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

1
0

Authors

Journals

citations
Cited by 1 publication
(4 citation statements)
references
References 7 publications
0
4
0
Order By: Relevance
“…This is why the pipeline that uses phonetic transcriptions outperforms Wav2Vec based embeddings. (Yadav et al, 2021) show that this is true even when Allosaurus embeddings are compared to phonetic transcriptions generated by Allosaurus. As the amount of available data decreases, intent classification systems built using phonetic transcriptions begin to outperform systems based on Allosaurus embeddings, thus showing that projecting input speech into phonetic transcriptions is the most exhaustive way to use the scarce amount of labelled data in the compounded low-resourced settings.…”
Section: Experiments With Phonetic Transcriptions Using Allosaurusmentioning
confidence: 90%
See 3 more Smart Citations
“…This is why the pipeline that uses phonetic transcriptions outperforms Wav2Vec based embeddings. (Yadav et al, 2021) show that this is true even when Allosaurus embeddings are compared to phonetic transcriptions generated by Allosaurus. As the amount of available data decreases, intent classification systems built using phonetic transcriptions begin to outperform systems based on Allosaurus embeddings, thus showing that projecting input speech into phonetic transcriptions is the most exhaustive way to use the scarce amount of labelled data in the compounded low-resourced settings.…”
Section: Experiments With Phonetic Transcriptions Using Allosaurusmentioning
confidence: 90%
“…Allosaurus is trained to recognize and transcribe input speech into a series of phones contained in the utterance, providing superior representations of input audio which can also be used for languages linguistically distant from high resourced languages like English. (Yadav et al, 2021) show that using embeddings generated from Allosaurus to encode speech content outperforms previous state-of-the-art methods for Sinhala and Tamil by large margins, while maintaining high performance on high resourced languages like English (99.08% classification accuracy for a 31-class intent classification problem). But the performance drops as the dataset size decreases and is not optimal for the task-specific low resourced settings that we are dealing with in this paper.…”
Section: Related Workmentioning
confidence: 93%
See 2 more Smart Citations