A Review of the Gumbel-max Trick and its Extensions for Discrete Stochasticity in Machine Learning

Huijben, Iris A. M.; Kool, Wouter; Paulus, Max B.; Sloun, Ruud J. G. van

doi:10.48550/arxiv.2110.01515

Cited by 3 publications

(6 citation statements)

References 70 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This allows us to obtain the REINFORCE estimator (Williams 1992), an unbiased estimator of the expectation but does not change the complexity of the method which stays linear in the catalog size. Indeed, sampling needs the computation of Z θ (x i ) or can be done with the gumbel trick (Huijben et al 2021) which both scale in O(P ). To lower the time complexity, we need to avoid sampling directly from π θ and use Monte Carlo techniques instead such as importance sampling (Chopin and Papaspiliopoulos 2020) with carefully chosen proposals to achieve fast sampling and accurate gradient approximation.…”

Section: Optimizing the Objectivementioning

confidence: 99%

Fast Offline Policy Optimization for Large Scale Recommendation

Sakhi

Rohde

Gilotte

2023

AAAI

View full text Add to dashboard Cite

Personalised interactive systems such as recommender systems require selecting relevant items from massive catalogs dependent on context. Reward-driven offline optimisation of these systems can be achieved by a relaxation of the discrete problem resulting in policy learning or REINFORCE style learning algorithms. Unfortunately, this relaxation step requires computing a sum over the entire catalogue making the complexity of the evaluation of the gradient (and hence each stochastic gradient descent iterations) linear in the catalogue size. This calculation is untenable in many real world examples such as large catalogue recommender systems, severely limiting the usefulness of this method in practice. In this paper, we derive an approximation of these policy learning algorithms that scale logarithmically with the catalogue size. Our contribution is based upon combining three novel ideas: a new Monte Carlo estimate of the gradient of a policy, the self normalised importance sampling estimator and the use of fast maximum inner product search at training time. Extensive experiments show that our algorithm is an order of magnitude faster than naive approaches yet produces equally good policies.

show abstract

Section: Optimizing the Objectivementioning

confidence: 99%

Fast Offline Policy Optimization for Large Scale Recommendation

Sakhi

Rohde

Gilotte

2023

AAAI

View full text Add to dashboard Cite

show abstract

“…assumption, i.e., is constructed by drawing samples independently from the categorical distribution VOLUME 11, 2023 Pr(y i |µ ). The algorithm for drawing samples from the categorical distribution is presented in Algorithm 1 where the core is Gumbel-Max trick [15].…”

Section: Sampling From a Categorical Distributionmentioning

confidence: 99%

“…The datasets used for testing are publicly available datasets GL3D [59], SUN3D [60], and YFCC [61] (details of the testing dataset are illustrated in Table 2). These datasets only contain image sequences; thus, we use the structure from motion (SfM) algorithm to estimate poses and the intrinsic matrices of cameras, and subsequently resort to Equation (15) to recover the essential matrices of image pairs. The tentative matches and their corresponding labels are obtained analogously to the methods for building labels of MR3D dataset.…”

Section: ) Testing Datasetmentioning

confidence: 99%

“…The proposed framework MESAC assumes that true matches between two-view images are constrained by an epipolar geometry (EG) [14] and the distribution of tentative matches is subject to a categorical distribution. The proposed MESAC is organized as follows: firstly, a PIN is applied to predict mathing probabilities for parametrizing a categorical distribution [15]; secondly, a sampling process is employed to draw minimal samples from the distribution to estimate EGs (e.g., fundamental or essential matrices); then, consensus sets corresponding to the estimated EGs are built, and the expected score of the consensuses is also estimated; finally, the gradient of the expectation w.r.t. learnable parameters is explicitly computed, and gradient ascent algorithm is subsequently utilized to achieve an optimal network.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

MESAC: Learning to Remove Mismatches via Maximizing the Expected Score of Sample Consensuses

et al. 2023

View full text Add to dashboard Cite

Most learning-based methods require labelling the training data, which is time-consuming and gives rise to wrong labels. To address the labelling issues thoroughly, we propose an unsupervised learning framework to remove mismatches by maximizing the expected score of sample consensuses (MESAC). The proposed MESAC can train various permutation invariant networks (PINs) based on training data with no labels, and has three distinct merits: 1) the framework can train various PINs in an unsupervised mode such that these are immune to wrong labels; 2) the gradients of the expected score are explicitly calculated by a revised score-function estimator, which can avoid gradient explosion; 3) the distribution of matching probabilities is learned from the PIN and precisely modelled by a categorical distribution, which can decrease the sampling times and improve the computational efficiency accordingly. Experiments of testing datasets disclose that mean recall is increased by at most 77% when pure PINs are embedded in MESAC, and mean precision is also improved by 16%. Applications in pose recovery indicate that the success rates of MESACintegrated PINs outperform the compared methods when training with neither matching labels nor ground truth epipolar geometry (EG) constraints, showing the great potential of MESAC in mismatch removal.

show abstract

“…The last layer has one output neuron per action and each neuron value encodes the probability of choosing its corresponding action. In order to sample an action, we use the Gumbel-max trick [57] which does not introduce any additional latencies.…”

Section: Implementation Of the Low-latency Networkmentioning

confidence: 99%