The Speaker and Language Recognition Workshop (Odyssey 2016) 2016
DOI: 10.21437/odyssey.2016-16
|View full text |Cite
|
Sign up to set email alerts
|

Deep Language: a comprehensive deep learning approach to end-to-end language recognition

Abstract: This work explores the use of various Deep Neural Network (DNN) architectures for an end-to-end language identification (LID) task. The approach has been proven to significantly improve the state-of-art in many domains include speech recognition, computer vision and genomics. As an end-to-end system, deep learning removes the burden of hand crafting the feature extraction is conventional approach in LID. This versatility is achieved by training a very deep network to learn distributed representations of speech… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
21
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
3
2
1
1

Relationship

2
5

Authors

Journals

citations
Cited by 22 publications
(21 citation statements)
references
References 11 publications
0
21
0
Order By: Relevance
“…As a result, an algorithm, which implicitly encapsulates meaningful patterns from multi-modal data into its latent space during the training phase, would be more robust and practical. • Enforcing the end-to-end design [22,23] to avoid the complication of intractable stacked errors, poor scalability to massive data sets, and challenging for practical deployment. • Unlike conventional semi-supervised learning where an unsupervised objective is created in order to improve the supervised task [15,24], semi-supervised learning for single-cell data aims for the opposite.…”
Section: Semi-supervised Learning For Single-cell Datamentioning
confidence: 99%
“…As a result, an algorithm, which implicitly encapsulates meaningful patterns from multi-modal data into its latent space during the training phase, would be more robust and practical. • Enforcing the end-to-end design [22,23] to avoid the complication of intractable stacked errors, poor scalability to massive data sets, and challenging for practical deployment. • Unlike conventional semi-supervised learning where an unsupervised objective is created in order to improve the supervised task [15,24], semi-supervised learning for single-cell data aims for the opposite.…”
Section: Semi-supervised Learning For Single-cell Datamentioning
confidence: 99%
“…Recently, end-to-end approaches have achieved impressive performance compare to conventional i-vector approach for both LID [4,5,12,25] and speaker recognition [7,6,26]. In [12], the authors conducted detailed experiments on an endto-end system using a dataset augmentation approach with acoustic features ranging from Mel-Frequency Cepstral Coefficients (MFCCs) to spectrograms.…”
Section: End-to-end Cnn/dnn Systemmentioning
confidence: 99%
“…After the training, we observed high variation of the performance among different utterance encodings. The issue can be traced to the imbalance in the utterance distribution between encodings, which has a strong negative impact on the network generalization performance [16,17]. Specifically, each training step can drive the network to a different sub-optimal solution created by the dominant classes [16].…”
Section: Cost Adaptive Objectivementioning
confidence: 99%
“…The issue can be traced to the imbalance in the utterance distribution between encodings, which has a strong negative impact on the network generalization performance [16,17]. Specifically, each training step can drive the network to a different sub-optimal solution created by the dominant classes [16]. Since deep learning, in general, can be seen as an automatic feature learning algorithm [18], the network should adapt its representation for modeling the language pattern in all encodings.…”
Section: Cost Adaptive Objectivementioning
confidence: 99%
See 1 more Smart Citation