This article introduces the probabilistic tensor decomposition toolbox -a MATLAB toolbox for tensor decomposition using Variational Bayesian inference and Gibbs sampling. An introduction and overview of probabilistic tensor decomposition and its connection with classical tensor decomposition methods based on maximum likelihood is provided. We subsequently describe the probabilistic tensor decomposition toolbox which encompasses the Canonical Polyadic, Tucker, and Tensor Train decomposition models. Currently, unconstrained, non-negative, orthogonal, and sparse factors are supported. Bayesian inference forms a principled way of incorporating prior knowledge, prediction of held-out data, and estimating posterior probabilities. Furthermore, it facilitates automatic model order determination, automatic regularization on factors (e.g. sparsity), and inherently penalizes model complexity which is beneficial when inferring hierarchical models, such as heteroscedastic noise modelling. The toolbox allows researchers to easily apply Bayesian tensor decomposition methods without the need to derive or implement these methods themselves. Furthermore, it serves as a reference implementation for comparing existing and new tensor decomposition methods. The software is available from https://github.com/JesperLH/prob-tensor-toolbox/. © 2020 The Author(s). Published by IOP Publishing Ltd Mach. Learn.: Sci. Technol. 1 (2020) 025011 J L Hinrich et al
Bayesian tensor decompositionFor comprehensive reviews of maximum likelihood based tensor decomposition methods the reader is referred to existing tensor decomposition reviews [6-9, 22, 23]. We presently provide a short overview of Bayesian tensor decomposition. For brevity, this section will drop the prefix Bayesian and unless otherwise stated all models are based on fully Bayesian inference.Currently, probabilistic models are primarily based on the Tucker and CP decomposition. The earliest works using Tucker decomposition are found in [24,25] where both the core array and factors follow a normal distribution. The latter also considered enforcing sparsity on the core array to automatically learn the most prominent multi-linear interactions. However, neither of these methods were fully Bayesian as they relied on maximum-a-posteriori estimation which provides a point estimate of the posterior distribution. The first fully Bayesian Tucker model with normal factors using Gibbs sampling was proposed in [26] and the indeterminacy and structure of the core array was explored in [27]. The Tucker decomposition was extended to handle missing values and sparse noise based on variational Bayesian (VB) inference [28] whereas the Tucker model with an infinite core size was explored in [29][30][31] with factors specified as Gaussian processes and inference based on VB. Extensions of the Tucker model to count data was explored in [32,33] using the Poisson likelihood and either Gamma or Dirichlet factors. For categorical data, a Tucker model with multinomial likelihood and factors was considered in [34]....