Spoken Language Identification Using Bidirectional LSTM Based LID Sequential Senones

Muralikrishna, H; Pulkit, Sapra; Jain, Anuksha; Dinesh, Dileep Aroor

doi:10.1109/asru46091.2019.9003947

Cited by 10 publications

(7 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The architecture of the embedding extractor is shown in Fig. 2, which is motivated by the network in [15] and [16]. It contains two bidirectional long short-term memory (BLSTM) layers with 256 and 64 nodes respectively in first and second layer.…”

Section: Feature Extractor Block For Obtaining Fixed-length U-vectormentioning

confidence: 99%

“…It contains two bidirectional long short-term memory (BLSTM) layers with 256 and 64 nodes respectively in first and second layer. These BLSTM layers analyze the input sequence of BNFs by dividing it into fixed-length chunks (with 50% overlap between successive chunks) to generate LID-seq-senones [15]. These LID-seq-senones are nothing but the activation obtained at the output of second BLSTM layer for each chunk of BNF vectors [15].…”

Section: Feature Extractor Block For Obtaining Fixed-length U-vectormentioning

confidence: 99%

“…These BLSTM layers analyze the input sequence of BNFs by dividing it into fixed-length chunks (with 50% overlap between successive chunks) to generate LID-seq-senones [15]. These LID-seq-senones are nothing but the activation obtained at the output of second BLSTM layer for each chunk of BNF vectors [15]. The mean and standard deviation of these LID-seq-senones are then computed using a statistics pooling layer.…”

Section: Feature Extractor Block For Obtaining Fixed-length U-vectormentioning

confidence: 99%

See 2 more Smart Citations

Spoken Language Identification in Unseen Target Domain Using Within-Sample Similarity Loss

Muralikrishna

Kapoor

Dinesh

et al. 2021

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Self Cite

View full text Add to dashboard Cite

State-of-the-art spoken language identification (LID) networks are vulnerable to channel-mismatch that occurs due to the differences in the channels used to obtain the training and testing samples. The effect of channel-mismatch is severe when the training dataset contains very limited channel diversity. One way to address channelmismatch is by learning a channel-invariant representation of the speech using adversarial multi-task learning (AMTL). But, AMTL approach cannot be used when the training samples do not contain the corresponding channel labels. To address this, we propose an auxiliary within-sample similarity loss (WSSL) which encourages the network to suppress the channel-specific contents in the speech. This does not require any channel labels. Specifically, WSSL gives the similarity between a pair of embeddings of same sample obtained by two separate embedding extractors. These embedding extractors are designed to capture similar information about the channel, but dissimilar LID-specific information in the speech. Furthermore, the proposed WSSL improves the noise-robustness of the LID-network by suppressing the background noise in the speech to some extent. We demonstrate the effectiveness of the proposed approach in both seen and unseen channel conditions using a set of datasets having significant channel-mismatch.

show abstract

Section: Feature Extractor Block For Obtaining Fixed-length U-vectormentioning

confidence: 99%

Section: Feature Extractor Block For Obtaining Fixed-length U-vectormentioning

confidence: 99%

Section: Feature Extractor Block For Obtaining Fixed-length U-vectormentioning

confidence: 99%

See 1 more Smart Citation

Spoken Language Identification in Unseen Target Domain Using Within-Sample Similarity Loss

Muralikrishna

Kapoor

Dinesh

et al. 2021

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Self Cite

View full text Add to dashboard Cite

show abstract

“…The authors would like to thank Hugo Jair Escalante, Isabelle Guyon and Qiang Yang for guidance as advisors. The platform, automl.ai 6 , is built based on Codalab 7 , an web-based platform for machine learning competitions [26].…”

Section: Acknowledgementsmentioning

confidence: 99%

“…In the past few decades, machine learning, especially deep learning, has achieved remarkable breakthroughs in a wide range of speech tasks, e.g., speech recognition [1,2], speaker verification [3,4,5], language identification [6,7] and emotion classification [8,9]. Each speech task has its own specific techniques in achieving the state-of-the-art results [3,6,8,10,11,12], which require efforts of a large number of experts. Thus, it is very difficult to switch between different speech tasks without human efforts.…”

Section: Introductionmentioning

confidence: 99%

AutoSpeech 2020: The Second Automated Machine Learning Challenge for Speech Classification

Wang¹,

et al. 2020

Preprint

View full text Add to dashboard Cite

The AutoSpeech challenge calls for automated machine learning (AutoML) solutions to automate the process of applying machine learning to speech processing tasks. These tasks, which cover a large variety of domains, will be shown to the automated system in a random order. Each time when the tasks are switched, the information of the new task will be hinted with its corresponding training set. Thus, every submitted solution should contain an adaptation routine which adapts the system to the new task. Compared to the first edition, the 2020 edition includes advances of 1) more speech tasks, 2) noisier data in each task, 3) a modified evaluation metric. This paper outlines the challenge and describe the competition protocol, datasets, evaluation metric, starting kit, and baseline systems.

show abstract

Adversarially Trained Hierarchical Attention Network for Domain-Invariant Spoken Language Identification

Goswami,

Muralikrishna,

Dileep

et al. 2023

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Spoken Language Identification Using Bidirectional LSTM Based LID Sequential Senones

Cited by 10 publications

References 16 publications

Spoken Language Identification in Unseen Target Domain Using Within-Sample Similarity Loss

Spoken Language Identification in Unseen Target Domain Using Within-Sample Similarity Loss

AutoSpeech 2020: The Second Automated Machine Learning Challenge for Speech Classification

Adversarially Trained Hierarchical Attention Network for Domain-Invariant Spoken Language Identification

Contact Info

Product

Resources

About