A Bayesian approach termed BAyesian Least Squares Optimization with Nonnegative L 1 -norm constraint (BALSON) is proposed. The error distribution of data fitting is described by Gaussian likelihood. The parameter distribution is assumed to be a Dirichlet distribution. With the Bayes rule, searching for the optimal parameters is equivalent to finding the mode of the posterior distribution. In order to explicitly characterize the nonnegative L 1 -norm constraint of the parameters, we further approximate the true posterior distribution by a Dirichlet distribution. We estimate the statistics of the approximating Dirichlet posterior distribution by sampling methods. Four sampling methods have been introduced. With the estimated posterior distributions, the original parameters can be effectively reconstructed in polynomial fitting problems, and the BALSON framework is found to perform better than conventional methods.Index Terms-Bayesian learning, least squares optimization, L 1 -norm constraint, Dirichlet distribution, sampling method 1. INTRODUCTION In machine learning and statistics, optimization methods, including Newton's method [1], quasi-Newton method [1], sequence quadratic programming (SQP) method [2], gradient descent method [3], interior-point (IP) method [4], and Bayesian methods [5,6,7], are widely applied. The least squares optimization (LSO), which is one of the unconstrained optimization problems, includes the residual sum of squares (RSS) errors as the objective function. This optimization can be proved and solved by proven algorithms with low computational complexity [8,9]. On this foundation, introduction of constraint conditions is beneficial to achieve numerical stability and increase predictive performance [9].Sparsity is a common constraint to make the objective function depend on only a small number of model parameters. L 0 -and L 1 -norm regularizations are the commonly used constraints for sparsity. L 0 -norm, denoted as · 0 , which can * Corresponding author. be defined as the number of non-zero elements in the parameter vector, performs the most precise sparsity of parameters, yet is difficult to implement in practice. L 1 -norm, denoted as · 1 , which can be defined as the sum of the absolute values of the elements in a parameter vector, performs a strong sparsity constraint to the vector, and is convenient to be applied. With the constraint of L 1 -norm regularization, the sparse representation [10], the nonlinear programming [11], and nonlinear time series prediction [12] are applied.In addition to the aforementioned methods, solution under Bayesian framework is an alternative solution. With the probabilistic interpretation, the LSO problem (i.e., the RSS objective function) is usually treated as Gaussian likelihood, and the constraint is considered as prior distribution. Combining the likelihood function with the prior distribution and with the Bayes theorem, finding the optimal solution to the constrained LSO problem is then equivalent to calculating the mode of the posterior distributi...