PAC-Bayes Bounds on Variational Tempered Posteriors for Markov Models

Banerjee, Imon; Rao, Vinayak; Honnappa, Harsha

doi:10.3390/e23030313

Cited by 2 publications

(4 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…More exponential moment inequalities (and moment inequalities) for dependent variables can be found in the paper [181] and in the book dedicated to weak dependence [61]. Other time series models where PAC-Bayes bounds were used include martingales [161], Markov chains [20], continuous dynamical systems [85], LTI systems [67]...…”

Section: Inequalities For Dependent Variablesmentioning

confidence: 99%

See 1 more Smart Citation

User-friendly introduction to PAC-Bayes bounds

Alquier¹

2021

Preprint

View full text Add to dashboard Cite

Aggregated predictors are obtained by making a set of basic predictors vote according to some weights, that is, to some probability distribution.Randomized predictors are obtained by sampling in a set of basic predictors, according to some prescribed probability distribution.Thus, aggregated and randomized predictors have in common that they are not defined by a minimization problem, but by a probability distribution on the set of predictors. In statistical learning theory, there is a set of tools designed to understand the generalization ability of such procedures: PAC-Bayesian or PAC-Bayes bounds.Since the original PAC-Bayes bounds [163,124], these tools have been considerably improved in many directions (we will for example describe a simplified version of the localization technique of [39,41] that was missed by the community, and later rediscovered as "mutual information bounds"). Very recently, PAC-Bayes bounds received a considerable attention: for example there was workshop on PAC-Bayes at NIPS 2017, (Almost) 50 Shades of Bayesian Learning: PAC-Bayesian trends and insights, organized by B. Guedj, F. Bach and P. Germain. One of the reasons of this recent success is the successful application of these bounds to neural networks [65].An elementary introduction to PAC-Bayes theory is still missing. This is an attempt to provide such an introduction. This is a preliminary version. If you find any typo/mistake, if you think your work is not cited and should be, please let me know, and I will update the tutorial accordingly. Since 1st version: fixed (minor) problems in Theorem 4.5, in Lemma 4.6 and in Subsection 6.5.2, fixed many typos (including some in the proof of Theorem 4.3), included ref. [26,131,114].1 I don't want to scare the reader with measurability conditions, as I will not check them in this tutorial anyway. Here, the exact condition to ensure that what follows is well defined is that for any A ∈ T , the function ((x 1 , y 1 ), . . . , (x n , y n )) → [ρ((x 1 , y 1 ), . . . , (x n , y n ))] (A) is measurable. That is, ρ is a regular conditional probability.2 See the title of van Erven's tutorial [175]: "PAC-Bayes mini-tutorial: a continuous union bound". Note, however, that it is argued by Catoni in [41] that PAC-Bayes bounds are actually more than that, we will come back to this in Section 4.

show abstract

Section: Inequalities For Dependent Variablesmentioning

confidence: 99%

“…More theoretical studies on variational inference (using PAC-Bayes, or not) appeared at the same time or since: [50,93,122,48,145,51,20,19,138,69].…”

Section: Variational Approximationsmentioning

confidence: 99%

User-friendly introduction to PAC-Bayes bounds

Alquier¹

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…In contrast, in this paper we deal with discrete-time systems with inputs and the learning takes place from a single time-series. In [29] learning of general Markov-chains was considered, but the state of the Markov-chain was assumed to be observable and no inputs were considered. The learning problem of [29] is thus different from the one considered in this paper.…”

Section: Introductionmentioning

confidence: 99%

“…In [29] learning of general Markov-chains was considered, but the state of the Markov-chain was assumed to be observable and no inputs were considered. The learning problem of [29] is thus different from the one considered in this paper.…”

Section: Introductionmentioning

confidence: 99%

PAC-Bayesian theory for stochastic LTI systems

Eringis

Leth

Wiśniewski

et al. 2021

2021 60th IEEE Conference on Decision and Control (CDC)

View full text Add to dashboard Cite

In this paper we derive a Probably Approxilmately Correct(PAC)-Bayesian error bound for linear time-invariant (LTI) stochastic dynamical systems with inputs. Such bounds are widespread in machine learning, and they are useful for characterizing the predictive power of models learned from finitely many data points. In particular, with the bound derived in this paper relates future average prediction errors with the prediction error generated by the model on the data used for learning. In turn, this allows us to provide finite-sample error bounds for a wide class of learning/system identification algorithms. Furthermore, as LTI systems are a sub-class of recurrent neural networks (RNNs), these error bounds could be a first step towards PAC-Bayesian bounds for RNNs.

show abstract

PAC-Bayes Bounds on Variational Tempered Posteriors for Markov Models

Cited by 2 publications

References 22 publications

User-friendly introduction to PAC-Bayes bounds

User-friendly introduction to PAC-Bayes bounds

PAC-Bayesian theory for stochastic LTI systems

Contact Info

Product

Resources

About