A Deep Dive Into Deep Learning Techniques for Solving Spoken Language Identification Problems

Das, Himanish Shekhar; Roy, Pinki

doi:10.1016/b978-0-12-818130-0.00005-2

Cited by 44 publications

(15 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Table 7 shows that the model with the highest accuracy value is the LSTM (4), which has an accuracy of 87.2% and an f1-score of 87.2%. This model is followed by the CNN model ( 4), which has an accuracy of 86.1%, CNN (2), which has an accuracy of 85.5%, ANN (1), which has an accuracy of 85%, ANN (2), which has an accuracy of 80%, and ANN (4), which has an accuracy of 78.9%. Table 8 shows that with a duration of 10 s, the model with the highest accuracy value is the LSTM model ( 4), which has an accuracy value of 88.8% and an f1-score of 87%.…”

Section: Resultsmentioning

confidence: 99%

“…The diversity of languages within each tribe, often known as local languages, is an intriguing aspect to incorporate into information technology via spoken language identification. Spoken language identification is the process of utilizing a computer system to determinate the language of a spoken utterance [2]. Language identification refers to spoken communication that can be identified by a computer system [3].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Spoken language identification on 4 Indonesian local languages using deep learning

Wijonarko

Zahra

2022

Bulletin EEI

View full text Add to dashboard Cite

Language identification is at the forefront of assistance in many applications, including multilingual speech systems, spoken language translation, multilingual speech recognition, and human-machine interaction via voice. The identification of indonesian local languages using spoken language identification technology has enormous potential to advance tourism potential and digital content in Indonesia. The goal of this study is to identify four Indonesian local languages: Javanese, Sundanese, Minangkabau, and Buginese, utilizing deep learning classification techniques such as artificial neural network (ANN), convolutional neural network (CNN), and long-term short memory (LSTM). The selected extraction feature for audio data extraction employs mel-frequency cepstral coefficient (MFCC). The results showed that the LSTM model had the highest accuracy for each speech duration (3 s, 10 s, and 30 s), followed by the CNN and ANN models.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Spoken language identification on 4 Indonesian local languages using deep learning

Wijonarko

Zahra

2022

Bulletin EEI

View full text Add to dashboard Cite

show abstract

“…Although the design of intelligent tutor system is called “intelligent” tutor system for evaluating students, it is not smart enough, and it still manages student information and arranges courses according to established rules, which canno't really replace tutors, and has certain limitations. Literature [ 20 ] lists the challenges faced by the modeling of simulated human scoring from the process and result levels, and points out that it is impossible to model the evaluation process comprehensively in the field of speech features and speech recognition at present. Literature [ 21 ] puts forward that under the dual influence of the abnormal needs of oral English learning and the development requirements of human computer interaction, the pronunciation evaluation system based on language lab recognition technology came into being.…”

Section: Related Workmentioning

confidence: 99%

Multi-Feature Intelligent Oral English Error Correction Based on Few-Shot Learning Technology

Zhang

Sun

2022

Computational Intelligence and Neuroscience

View full text Add to dashboard Cite

The computer-aided language teaching system is maturing thanks to the advancement of few-shot learning technologies. In order to support teachers and increase students’ learning efficiency, more computer-aided language teaching systems are being used in teaching and examinations. This study focuses on a multifeature fusion-based evaluation method for oral English learning, completely evaluating specific grammar, and assisting oral learners in improving their oral pronunciation skills. This study proposes an improved method based on HMM a posteriori probability scoring, in which the only standard reference model is no longer used as the basis for scoring and error determination, and instead, the average level of standard pronunciation in the entire corpus is introduced as another judgment basis, based on a preliminary study of speech recognition technology, scoring methods, and relevant theoretical knowledge of information feedback. This strategy can reduce the score limitation caused by standard pronunciation personal differences, lower the system’s misjudgment rate in detecting pronunciation errors, and improve the usefulness of error correction information. An expert opinion database has been created based on the most prevalent forms of spoken pronunciation problems, which can successfully assist learners improve their spoken English level by combining the database’s corrected information. Finally, this study proposes an artificial scoring system for spoken English that performs activities such as identification, scoring, error judgment, and correction opinion feedback, among others. Finally, it has been demonstrated through trials and tests that adding the average pronunciation level to the system improves the system’s scoring performance and has a certain effect on increasing users’ oral pronunciation level.

show abstract

“…Second, they perform language identification using the Bernoulli Naive Bayes approach on a dataset consisting of 22 languages.When comparing CNN and model fitting data, it takes a bit longer to complete the comparison. Himanish Shekhar Das et al,[8] automatic language identification (LID) is a tough research topic in the realm of speech signal processing. It is used as the front end for many different applications, including multilingual conversational systems and spoken language translation.…”

mentioning

confidence: 99%

Spoken Language Recognization Based on Features and Classification Methods

Bam

Degadwala²,

Upadhyay

et al. 2022

IJSRCSEIT

View full text Add to dashboard Cite

In Western countries, speech-recognition applications are accepted. In East Asia, it isn't as common. The complexity of the language might be one of the main reasons for this latency. Furthermore, multilingual nations such as India must be considered in order to achieve language recognition (words and phrases) utilizing speech signals. In the last decade, experts have been clamoring for more study on speech. In the initial part of the pre-processing step, a pitch and audio feature extraction technique were used, followed by a deep learning classification method, to properly identify the spoken language. Various feature extraction approaches will be discussed in this review, along with their advantages and disadvantages. Purpose of this research is to Learn transfer learning approaches like Alexnet, VGGNet, and ResNet & CNN etc. using CNN model we got best accuracy for Language Recognition.

show abstract

A Deep Dive Into Deep Learning Techniques for Solving Spoken Language Identification Problems

Cited by 44 publications

References 8 publications

Spoken language identification on 4 Indonesian local languages using deep learning

Spoken language identification on 4 Indonesian local languages using deep learning

Multi-Feature Intelligent Oral English Error Correction Based on Few-Shot Learning Technology

Spoken Language Recognization Based on Features and Classification Methods

Contact Info

Product

Resources

About