2019
DOI: 10.1007/978-3-030-34255-5_17
|View full text |Cite
|
Sign up to set email alerts
|

Spoken Language Identification Using ConvNets

Abstract: Language Identification (LI) is an important first step in several speech processing systems. With a growing number of voice-based assistants, speech LI has emerged as a widely researched field. To approach the problem of identifying languages, we can either adopt an implicit approach where only the speech for a language is present or an explicit one where text is available with its corresponding transcript. This paper focuses on an implicit approach due to the absence of transcriptive data. This paper benchma… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
15
1

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 24 publications
(16 citation statements)
references
References 23 publications
0
15
1
Order By: Relevance
“…The attributes of the proposed method are represented in Table 6 . The trial and error method is used while running the convolution neural network [ 8 , 14 ], word embedding Keras [ 34 , 35 ], and Naïve Bayes [ 36 – 38 ]. The selection of hyperparameter is also defined as an NP-complete problem [ 39 , 40 ].…”
Section: Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…The attributes of the proposed method are represented in Table 6 . The trial and error method is used while running the convolution neural network [ 8 , 14 ], word embedding Keras [ 34 , 35 ], and Naïve Bayes [ 36 – 38 ]. The selection of hyperparameter is also defined as an NP-complete problem [ 39 , 40 ].…”
Section: Resultsmentioning
confidence: 99%
“…Various state-of-the-art results on various audio classification tasks have been obtained by using log-Mel spectrograms of raw audio, like features, which convert the audio utterance into images [ 8 ]. CNN gives an excellent performance gain in classification on these features [ 14 ]. The motivation of work has come from these studies.…”
Section: Proposed Spoken Language Identification Frameworkmentioning
confidence: 99%
See 1 more Smart Citation
“…In a more traditional linguistic setting, Sarthak et al [34] explore 1D-ConvNet that auto-extracts and classifies features from raw audio input and 2D-ConvNet architectures, and enhance the performance of these approaches by utilizing Mixup augmentation of inputs and attention mechanism. They achieve 93.7% and 95.4% overall accuracy on a six language (En, Fr, De, Es, Ru, It) dataset with overlapping phonemes based on the VoxForge [38] dataset.…”
Section: Previous Workmentioning
confidence: 99%
“…Sarthak, et al [21] used raw audio waveforms as sound input which provide increased performance by avoiding overheads in calculating log-Mel spectrum for each audio file. This study uses the convolutional neural networks classification method because based on the journals studied, convolutional neural networks have very good performance compared to other machine learning techniques, the sound input data used is raw audio waveform because it is quite popular, because raw audio waveforms have advantages.…”
Section: Introductionmentioning
confidence: 99%