Laplace Group Sensing for Acoustic Models

Chien, Jen‐Tzung

doi:10.1109/taslp.2015.2412466

Cited by 11 publications

(8 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The paradigm on combination of traditional and modern machine learning based on probabilistic model and neural network [8] is addressed, respectively. The second section surveys a number of Bayesain methods ranging from latent variable model to variational inference [5,9,36], sampling method [4,6,7], deep unfolding [11] and Bayesian neural network [29]. In the third section, a series of advanced deep models including endto-end memory network [12,37], sequence-to-sequence network [20,22], convolutional network [8,16,25,38], dilated network [2] and attention network [13,17,35] are introduced.…”

Section: Bayesian Information Processingmentioning

confidence: 99%

Neural Bayesian Information Processing

Chien

2020

Proceedings of the 29th ACM International Conference on Information &Amp; Knowledge Management

Self Cite

View full text Add to dashboard Cite

Deep learning is developed as a learning process from source inputs to target outputs where the inference or optimization is performed over an assumed deterministic model with deep structure. A wide range of temporal and spatial data in language and vision are treated as the inputs or outputs to build such a complicated mapping in different information systems. A systematic and elaborate transfer is required to meet the mapping between source and target domains. Also, the semantic structure in natural language and computer vision may not be well represented or trained in mathematical logic or computer programs. The distribution function in discrete or continuous latent variable model for words, sentences, images or videos may not be properly decomposed or estimated. The system robustness to heterogeneous environments may not be assured. This tutorial addresses the fundamentals and advances in statistical models and neural networks, and presents a series of deep Bayesian solutions including variational Bayes, sampling method, Bayesian neural network, variational auto-encoder (VAE), stochastic recurrent neural network, sequence-to-sequence model, attention mechanism, end-to-end network, stochastic temporal convolutional network, temporal difference VAE, normalizing flow and neural ordinary differential equation. Enhancing the prior/posterior representation is addressed in different latent variable models. We illustrate how these models are connected and why they work for a variety of applications on complex patterns in language and vision. The word, sentence and image embeddings are merged with semantic constraint or structural information. Bayesian learning is formulated in the optimization procedure where the posterior collapse is tackled. An informative latent space is trained to incorporate deep Bayesian learning in various information systems.

show abstract

Section: Bayesian Information Processingmentioning

confidence: 99%

Neural Bayesian Information Processing

Chien

2020

Proceedings of the 29th ACM International Conference on Information &Amp; Knowledge Management

Self Cite

View full text Add to dashboard Cite

show abstract

“…A new paradigm called the symbolic neural learning is introduced to extend how data analysis is performed from language processing to semantic learning and memory networking. Secondly, we address a number of Bayesian models ranging from latent variable model to VB inference (Chien and Chang, 2014;Chien and Chueh, 2011;Chien, 2015b), MCMC sampling (Watanabe and Chien, 2015) and BNP learning (Chien, 2016;Chien, 2015a;Chien, 2018) for hierarchical, thematic and sparse topics from natural language. In the third part, a series of deep models including deep unfolding (Chien and Lee, 2018), Bayesian RNN (Gal and Ghahramani, 2016;Chien and Ku, 2016), sequence-to-sequence learning (Graves et al, 2006;Gehring et al, 2017), CNN (Kalchbrenner et al, 2014;Xingjian et al, 2015;, GAN (Tsai and Chien, 2017) and VAE are introduced.…”

Section: Description Of Tutorial Contentmentioning

confidence: 99%

Deep Bayesian Mining, Learning and Understanding

Chien

2019

Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery &Amp; Data Mining

Self Cite

View full text Add to dashboard Cite

Given the current growth in research and related emerging technologies in machine learning and deep learning, it is timely to introduce this tutorial to a large number of researchers and practitioners who are attending COLING 2018 and working on statistical models, deep neural networks, sequential learning and natural language understanding. To the best of our knowledge, there is no similar tutorial presented in previous ACL/COLING/EMNLP/NAACL. This three-hour tutorial will concentrate on a wide range of theories and applications and systematically present the recent advances in deep Bayesian and sequential learning which are impacting the communities of computational linguistics, human language technology and machine learning for natural language processing.

show abstract

“…With the estimated posterior distributions, the original parameters can be effectively reconstructed in polynomial fitting problems, and the BALSON framework is found to perform better than conventional methods.Index Terms-Bayesian learning, least squares optimization, L 1 -norm constraint, Dirichlet distribution, sampling method 1. INTRODUCTION In machine learning and statistics, optimization methods, including Newton's method [1], quasi-Newton method [1], sequence quadratic programming (SQP) method [2], gradient descent method [3], interior-point (IP) method [4], and Bayesian methods [5,6,7], are widely applied. The least squares optimization (LSO), which is one of the unconstrained optimization problems, includes the residual sum of squares (RSS) errors as the objective function.…”

mentioning

confidence: 99%

“…For example, with the L 1 -norm constraint, the prior distribution is usually assumed to be a Laplacian [6,13,14]. Chien [5] proposed a Bayesian framework based on the Laplace prior of model parameters for sparse representation of sequential data. Finding the mode of the posterior distribution for Gaussian likelihood and Laplacian prior can solve the sparse optimization problem with numerical simulation.There exists another type of regularization with nonnegative L 1 -norm constraint, i.e., the regularization term contains nonnegative elements only [9].…”

mentioning

confidence: 99%

“…Finding the mode of the posterior distribution for Gaussian likelihood and Laplacian prior can solve the sparse optimization problem with numerical simulation.There exists another type of regularization with nonnegative L 1 -norm constraint, i.e., the regularization term contains nonnegative elements only [9]. Nonnegative constraint plays an important role for solving the general nonnegative linear or nonlinear programming problems in physics (for example, fluid physics) [15] and engineering applications (for example, hyperspectral image processing, audio processing, web documents analysis, and bioinformatics data processing) [5,10,16]. In this case, Laplacian assumption cannot describe the constraint well as it has negative support.In this paper, we propose a Bayesian learning framework to solve this LSO problem with nonnegative L 1 -norm con-…”

mentioning

confidence: 99%

See 1 more Smart Citation

Balson: Bayesian Least Squares Optimization With Nonnegative L1-Norm Constraint

Xie

Zhang

et al. 2018

2018 IEEE 28th International Workshop on Machine Learning for Signal Processing (MLSP)

Self Cite

View full text Add to dashboard Cite

A Bayesian approach termed BAyesian Least Squares Optimization with Nonnegative L 1 -norm constraint (BALSON) is proposed. The error distribution of data fitting is described by Gaussian likelihood. The parameter distribution is assumed to be a Dirichlet distribution. With the Bayes rule, searching for the optimal parameters is equivalent to finding the mode of the posterior distribution. In order to explicitly characterize the nonnegative L 1 -norm constraint of the parameters, we further approximate the true posterior distribution by a Dirichlet distribution. We estimate the statistics of the approximating Dirichlet posterior distribution by sampling methods. Four sampling methods have been introduced. With the estimated posterior distributions, the original parameters can be effectively reconstructed in polynomial fitting problems, and the BALSON framework is found to perform better than conventional methods.Index Terms-Bayesian learning, least squares optimization, L 1 -norm constraint, Dirichlet distribution, sampling method 1. INTRODUCTION In machine learning and statistics, optimization methods, including Newton's method [1], quasi-Newton method [1], sequence quadratic programming (SQP) method [2], gradient descent method [3], interior-point (IP) method [4], and Bayesian methods [5,6,7], are widely applied. The least squares optimization (LSO), which is one of the unconstrained optimization problems, includes the residual sum of squares (RSS) errors as the objective function. This optimization can be proved and solved by proven algorithms with low computational complexity [8,9]. On this foundation, introduction of constraint conditions is beneficial to achieve numerical stability and increase predictive performance [9].Sparsity is a common constraint to make the objective function depend on only a small number of model parameters. L 0 -and L 1 -norm regularizations are the commonly used constraints for sparsity. L 0 -norm, denoted as · 0 , which can * Corresponding author. be defined as the number of non-zero elements in the parameter vector, performs the most precise sparsity of parameters, yet is difficult to implement in practice. L 1 -norm, denoted as · 1 , which can be defined as the sum of the absolute values of the elements in a parameter vector, performs a strong sparsity constraint to the vector, and is convenient to be applied. With the constraint of L 1 -norm regularization, the sparse representation [10], the nonlinear programming [11], and nonlinear time series prediction [12] are applied.In addition to the aforementioned methods, solution under Bayesian framework is an alternative solution. With the probabilistic interpretation, the LSO problem (i.e., the RSS objective function) is usually treated as Gaussian likelihood, and the constraint is considered as prior distribution. Combining the likelihood function with the prior distribution and with the Bayes theorem, finding the optimal solution to the constrained LSO problem is then equivalent to calculating the mode of the posterior distributi...

show abstract

Laplace Group Sensing for Acoustic Models

Cited by 11 publications

References 25 publications

Neural Bayesian Information Processing

Neural Bayesian Information Processing

Deep Bayesian Mining, Learning and Understanding

Balson: Bayesian Least Squares Optimization With Nonnegative L1-Norm Constraint

Contact Info

Product

Resources

About