Cross-Corpora Language Recognition: A Preliminary Investigation with Indian Languages

Dey, Spandan; Saha, Goutam; Sahidullah, Md

doi:10.23919/eusipco54536.2021.9616273

Cited by 6 publications

(13 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In our another study [46], the generalization capabilities of the standalone LID systems, trained with a single corpus, were investigated by cross-corpora evaluation. The three most widely used corpora: IIITH-ILSC, LDC2017S14, and IITKGP-MLILSC, were considered in this study.…”

Section: Das Et Al (2020)mentioning

confidence: 99%

“…The pronunciations are professional, and the accent and dialects are standardized. Even due to the recording device, corpora bias can exist [46]. Therefore, in real-world scenarios, the stand-alone Indian LID systems trained with the smaller corpus can exhibit poor generalization.…”

Section: Generalization Of Lid Systemsmentioning

confidence: 99%

“…To assess the generalization issue, we have conducted a cross-corpora evaluation with the most widely used speech corpora for the Indian languages [46]. We have shown that LID systems trained with one corpus perform poorly when the test data comes from another speech corpus.…”

Section: Generalization Of Lid Systemsmentioning

confidence: 99%

“…We have shown that LID systems trained with one corpus perform poorly when the test data comes from another speech corpus. Applying several feature post-processing techniques can help to improve the generalization for cross-corpora evaluation as well as in real-world scenarios [46]. The poor performance in the cross-corpora scenario is expected due to acoustic mismatch, session variability, speaker characteristics diferences, etc.…”

Section: Generalization Of Lid Systemsmentioning

confidence: 99%

See 3 more Smart Citations

An Overview of Indian Spoken Language Recognition from Machine Learning Perspective

Dey

Sahidullah

Saha

2022

ACM Trans. Asian Low-Resour. Lang. Inf. Process.

Self Cite

View full text Add to dashboard Cite

Automatic spoken language identification (LID) is a very important research field in the era of multilingual voice-command-based human-computer interaction (HCI). A front-end LID module helps to improve the performance of many speech-based applications in the multilingual scenario. India is a populous country with diverse cultures and languages. The majority of the Indian population needs to use their respective native languages for verbal interaction with machines. Therefore, the development of efficient Indian spoken language recognition systems is useful for adapting smart technologies in every section of Indian society. The field of Indian LID has started gaining momentum in the last two decades, mainly due to the development of several standard multilingual speech corpora for the Indian languages. Even though significant research progress has already been made in this field, to the best of our knowledge, there are not many attempts to analytically review them collectively. In this work, we have conducted one of the very first attempts to present a comprehensive review of the Indian spoken language recognition research field. In-depth analysis has been presented to emphasize the unique challenges of low-resource and mutual influences for developing LID systems in the Indian contexts. Several essential aspects of the Indian LID research, such as the detailed description of the available speech corpora, the major research contributions, including the earlier attempts based on statistical modeling to the recent approaches based on different neural network architectures, and the future research trends are discussed. This review work will help assess the state of the present Indian LID research by any active researcher or any research enthusiasts from related fields.

show abstract

Section: Das Et Al (2020)mentioning

confidence: 99%

Section: Generalization Of Lid Systemsmentioning

confidence: 99%

Section: Generalization Of Lid Systemsmentioning

confidence: 99%

Section: Generalization Of Lid Systemsmentioning

confidence: 99%

See 2 more Smart Citations

An Overview of Indian Spoken Language Recognition from Machine Learning Perspective

Dey

Sahidullah

Saha

2022

ACM Trans. Asian Low-Resour. Lang. Inf. Process.

Self Cite

View full text Add to dashboard Cite

show abstract

“…For efficient real-world deployment of the speech applications, improving the generalization of the front-end LID module is important. For LID systems, the generalized classifier should be robust against several non-lingual sources, such as speaker identity, gender, age, dialects, and accents, mismatches due to channel and background environments [17]. We can assume that diversity in non-lingual effects is expected to increase in larger speech corpora with greater diversity in data collection settings.…”

Section: Introductionmentioning

confidence: 99%

Cross-corpora spoken language identification with domain diversification and generalization

Dey

Sahidullah

Saha

2023

Computer Speech & Language

View full text Add to dashboard Cite

Importance of Supra-Segmental Information and Self-Supervised Framework for Spoken Language Diarization Task

Mishra

Prasanna

2022

Lecture Notes in Computer Science

View full text Add to dashboard Cite

In a code-switched (CS) scenario, the use of spoken language diarization (LD) as a pre-possessing system is essential. Further, the use of implicit frameworks is preferable over the explicit framework, as it can be easily adapted to deal with low/zero resource languages. Inspired by speaker diarization (SD) literature, three frameworks based on (1) fixed segmentation, (2) change point-based segmentation and (3) E2E are proposed to perform LD. The initial exploration with synthetic TTSF-LD dataset shows, using x-vector as implicit language representation with appropriate analysis window length (N ) can able to achieve at per performance with explicit LD. The best implicit LD performance of 6.38 in terms of Jaccard error rate (JER) is achieved by using the E2E framework. However, considering the E2E framework the performance of implicit LD degrades to 60.4 while using with practical Microsoft CS (MSCS) dataset. The difference in performance is mostly due to the distributional difference between the monolingual segment duration of secondary language in the MSCS and TTSF-LD datasets. Moreover, to avoid segment smoothing, the smaller duration of the monolingual segment suggests the use of a small value of N . At the same time with small N , the x-vector representation is unable to capture the required language discrimination due to the acoustic similarity, as the same speaker is speaking both languages. Therefore, to resolve the issue a self-supervised implicit language representation is proposed in this study. In comparison with the x-vector representation, the proposed representation provides a relative improvement of 63.9% and achieved a JER of 21.8 using the E2E framework.

show abstract

Cross-Corpora Language Recognition: A Preliminary Investigation with Indian Languages

Cited by 6 publications

References 29 publications

An Overview of Indian Spoken Language Recognition from Machine Learning Perspective

An Overview of Indian Spoken Language Recognition from Machine Learning Perspective

Cross-corpora spoken language identification with domain diversification and generalization

Importance of Supra-Segmental Information and Self-Supervised Framework for Spoken Language Diarization Task

Contact Info

Product

Resources

About