A Bayesian Interpretation of the Light Gated Recurrent Unit

Bittar, Alexandre; Garner, Philip N.

doi:10.1109/icassp39728.2021.9414259

Cited by 4 publications

(3 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The gated recurrent unit (GRU) of Cho et al (2014) and the light GRU (liGRU) of Ravanelli et al (2018) constitute gradual simplifications of the LSTM with fewer gates in an effort to reduce the size of recurrent units. Very recently, the authors have derived a probabilistically interpretable version of the liGRU called light Bayesian recurrent unit (liBRU) that showed slight improvements over the liGRU on speech recognition tasks (Bittar and Garner, 2021). We will implement MLPs, RNNs, liBRUs, and GRUs, which will serve as an ANNbaseline to compare with our SNNs.…”

Section: Artificial Neural Networkmentioning

confidence: 99%

A surrogate gradient spiking baseline for speech command recognition

Bittar

Garner

2022

Front. Neurosci.

Self Cite

View full text Add to dashboard Cite

Artificial neural networks (ANNs) are the basis of recent advances in artificial intelligence (AI); they typically use real valued neuron responses. By contrast, biological neurons are known to operate using spike trains. In principle, spiking neural networks (SNNs) may have a greater representational capability than ANNs, especially for time series such as speech; however their adoption has been held back by both a lack of stable training algorithms and a lack of compatible baselines. We begin with a fairly thorough review of literature around the conjunction of ANNs and SNNs. Focusing on surrogate gradient approaches, we proceed to define a simple but relevant evaluation based on recent speech command tasks. After evaluating a representative selection of architectures, we show that a combination of adaptation, recurrence and surrogate gradients can yield light spiking architectures that are not only able to compete with ANN solutions, but also retain a high degree of compatibility with them in modern deep learning frameworks. We conclude tangibly that SNNs are appropriate for future research in AI, in particular for speech processing applications, and more speculatively that they may also assist in inference about biological function.

show abstract

Section: Artificial Neural Networkmentioning

confidence: 99%

A surrogate gradient spiking baseline for speech command recognition

Bittar

Garner

2022

Front. Neurosci.

Self Cite

View full text Add to dashboard Cite

show abstract

“…Similarly to [19], for the case of multivariate normal distributions that share the same covariance matrix Σ, i.e., p(xt|φt) ∼ N (µ, Σ) and p(xt| ¬φt ) ∼ N (ν, Σ), the parameters W ∈ R F ×H and b ∈ R H can be expressed as,…”

Section: Neural Network Formulationmentioning

confidence: 99%

“…Recurrence emerges naturally from Bayes's theorem which updates a prior probability into a posterior given new observational data. In our previous work on the light Bayesian recurrent unit [19], hidden features were assumed to be interdependent, which led to a layer-wise recurrence for the computation of prior probabilities. In this paper, in a mainly theoretical contribution, we come back to the simpler case of a RNN with unit-wise recurrence and no gate.…”

Section: Introductionmentioning

confidence: 99%

Bayesian Recurrent Units and the Forward-Backward Algorithm

Bittar¹,

Garner²

2022

Preprint

Self Cite

View full text Add to dashboard Cite

Using Bayes's theorem, we derive a unit-wise recurrence as well as a backward recursion similar to the forward-backward algorithm. The resulting Bayesian recurrent units can be integrated as recurrent neural networks within deep learning frameworks, while retaining a probabilistic interpretation from the direct correspondence with hidden Markov models. Whilst the contribution is mainly theoretical, experiments on speech recognition indicate that adding the derived units at the end of stateof-the-art recurrent architectures can improve the performance at a very low cost in terms of trainable parameters.

show abstract