Deep Language: a comprehensive deep learning approach to end-to-end language recognition

Trọng, Trung Ngô; Hautamäki, Ville; Lee, Kong Aik

doi:10.21437/odyssey.2016-16

Cited by 22 publications

(21 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As a result, an algorithm, which implicitly encapsulates meaningful patterns from multi-modal data into its latent space during the training phase, would be more robust and practical. • Enforcing the end-to-end design [22,23] to avoid the complication of intractable stacked errors, poor scalability to massive data sets, and challenging for practical deployment. • Unlike conventional semi-supervised learning where an unsupervised objective is created in order to improve the supervised task [15,24], semi-supervised learning for single-cell data aims for the opposite.…”

Section: Semi-supervised Learning For Single-cell Datamentioning

confidence: 99%

SISUA: Semi-Supervised Generative Autoencoder for Single Cell Data

Kramer

Mehtonen

et al. 2019

Preprint

Self Cite

View full text Add to dashboard Cite

Single-cell transcriptomics offers a tool to study the diversity of cell phenotypes through snapshots of the abundance of mRNA in individual cells. Often there is additional information available besides the single cell gene expression counts, such as bulk transcriptome data from the same tissue, or quantification of surface protein levels from the same cells. In this study, we propose models based on the Bayesian generative approach, where protein quantification available as CITE-seq counts from the same cells are used to constrain the learning process, thus forming a semi-supervised model. The generative model is based on the deep variational autoencoder (VAE) neural network architecture.Keywords semi-supervised · single-cell · RNA sequencing · deep learning · Bayesian inference 1 IntroductionSingle-cell RNA sequencing (scRNA-seq) [1,2,3] is a powerful tool to analyze cell states based on their gene expression profile with high resolution. RNA sequencing at single-cell level facilitates uncovering heterogeneous gene expression patterns in seemingly homogeneous cell populations. However, the current methods for gene expression profiling at single cell resolution are prone to experimental errors, in particular, inefficient capture of mRNAs [2]. This capture inefficiency results into a general underestimation of the counts (dropout effect). This represents a problem as the current computational approaches for analyzing single-cell data rely on the mRNA counts for clustering and downstream analysis.Generally, the solution to the dropout problem has been posed as an imputation task, where missing counts are filled with estimated counts. Different methods have been proposed for this task, such as non-negative regression [4] or graph-based methods [5]. Another option is to model the dropout effect using the zero-inflated (ZI) model [6], where a two-component mixture distribution is constructed, such that the first component models the dropout effect and the second component the observed counts. The effect of overdispersion is strongly presented in the scRNA-seq counts, the negative binomial (NB) distribution is seen as an appropriate fit to the observed data [7]. Shallow imputation models that are based on zero-inflated negative binomial (ZINB) or zero-inflated log-normal models have been applied to single-cell data [8,9]. However, these models hypothesize a linear relation between the latent space and the model parameters, which is quite a strong assumption [10]. To overcome the limitations of the linear models, deep neural network architectures have been proposed to resolve missing data (dropouts) [11]. However, discerning technical A PREPRINT -MAY 8, 2019 variation from biological signal solely based on scRNA-seq data is challenging, and assumes that a large number of similar cells are measured.Accurate imputation strategies are important for downstream analysis, including identification of cell type marker genes, characterization of functional state [12], or the analysis of transcriptome dynamics along differentia...

show abstract

Section: Semi-supervised Learning For Single-cell Datamentioning

confidence: 99%

SISUA: Semi-Supervised Generative Autoencoder for Single Cell Data

Kramer

Mehtonen

et al. 2019

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…Recently, end-to-end approaches have achieved impressive performance compare to conventional i-vector approach for both LID [4,5,12,25] and speaker recognition [7,6,26]. In [12], the authors conducted detailed experiments on an endto-end system using a dataset augmentation approach with acoustic features ranging from Mel-Frequency Cepstral Coefficients (MFCCs) to spectrograms.…”

Section: End-to-end Cnn/dnn Systemmentioning

confidence: 99%

Unsupervised Representation Learning of Speech for Dialect Identification

Shon

Hsu

Glass

2018

2018 IEEE Spoken Language Technology Workshop (SLT)

View full text Add to dashboard Cite

In this paper, we explore the use of a factorized hierarchical variational autoencoder (FHVAE) model to learn an unsupervised latent representation for dialect identification (DID). An FHVAE can learn a latent space that separates the more static attributes within an utterance from the more dynamic attributes by encoding them into two different sets of latent variables. Useful factors for dialect identification, such as phonetic or linguistic content, are encoded by a segmental latent variable, while irrelevant factors that are relatively constant within a sequence, such as a channel or a speaker information, are encoded by a sequential latent variable. The disentanglement property makes the segmental latent variable less susceptible to channel and speaker variation, and thus reduces degradation from channel domain mismatch. We demonstrate that on fully-supervised DID tasks, an end-toend model trained on the features extracted from the FH-VAE model achieves the best performance, compared to the same model trained on conventional acoustic features and an i-vector based system. Moreover, we also show that the proposed approach can leverage a large amount of unlabeled data for FHVAE training to learn domain-invariant features for DID, and significantly improve the performance in a lowresource condition, where the labels for the in-domain data are not available.

show abstract

“…After the training, we observed high variation of the performance among different utterance encodings. The issue can be traced to the imbalance in the utterance distribution between encodings, which has a strong negative impact on the network generalization performance [16,17]. Specifically, each training step can drive the network to a different sub-optimal solution created by the dominant classes [16].…”

Section: Cost Adaptive Objectivementioning

confidence: 99%

“…The issue can be traced to the imbalance in the utterance distribution between encodings, which has a strong negative impact on the network generalization performance [16,17]. Specifically, each training step can drive the network to a different sub-optimal solution created by the dominant classes [16]. Since deep learning, in general, can be seen as an automatic feature learning algorithm [18], the network should adapt its representation for modeling the language pattern in all encodings.…”

Section: Cost Adaptive Objectivementioning

confidence: 99%

“…The backend is replaced by heteroscedastic linear discriminant analysis (HLDA) [20] and the cosine similarity is used subsequently. We use the combination of CNN, LSTM and feedforward neural network as described in [16]. The network design is specifically fine-tuned for our task and re-scaled to match the size of the North Sami corpora.…”

Section: Baseline Systemsmentioning

confidence: 99%

See 1 more Smart Citation

Staircase Network: structural language identification via hierarchical attentive units

Trọng¹,

Hautamäki²,

Jokinen³

2018

The Speaker and Language Recognition Workshop (Odyssey 2018)

Self Cite

View full text Add to dashboard Cite

Language recognition system is typically trained directly to optimize classification error on the target language labels, without using the external, or meta-information in the estimation of the model parameters. However labels are not independent of each other, there is a dependency enforced by, for example, the language family, which affects negatively on classification. The other external information sources (e.g. audio encoding, telephony or video speech) can also decrease classification accuracy. In this paper, we attempt to solve these issues by constructing a deep hierarchical neural network, where different levels of meta-information are encapsulated by attentive prediction units and also embedded into the training progress. The proposed method learns auxiliary tasks to obtain robust internal representation and to construct a variant of attentive units within the hierarchical model. The final result is the structural prediction of the target language and a closely related language family. The algorithm reflects a "staircase" way of learning in both its architecture and training, advancing from the fundamental audio encoding to the language family level and finally to the target language level. This process not only improves generalization but also tackles the issues of imbalanced class priors and channel variability in the deep neural network model. Our experimental findings show that the proposed architecture outperforms the state-of-the-art i-vector approaches on both small and big language corpora by a significant margin.

show abstract

Deep Language: a comprehensive deep learning approach to end-to-end language recognition

Cited by 22 publications

References 11 publications

SISUA: Semi-Supervised Generative Autoencoder for Single Cell Data

SISUA: Semi-Supervised Generative Autoencoder for Single Cell Data

Unsupervised Representation Learning of Speech for Dialect Identification

Staircase Network: structural language identification via hierarchical attentive units

Contact Info

Product

Resources

About