“…It is possible to use a variational approach to estimate a bound on the mutual information between continuous, high-dimensional quantities (Donsker & Varadhan, 1983; Nguyen et al, 2010; Alemi et al, 2016; Belghazi et al, 2018; Oord et al, 2018; Poole et al, 2019). Recent works capture this intuition to yield self-supervised embeddings in the modalities of imaging (Oord et al, 2018; Hjelm et al, 2018; Bachman et al, 2019; Tian et al, 2019; Hénaff et al, 2019; Löwe et al, 2019; He et al, 2019; Chen et al, 2020; Tian et al, 2020; Wang & Isola, 2020), text (Rivière et al, 2020; Oord et al, 2018; Kong et al, 2019), and audio (Löwe et al, 2019; Oord et al, 2018), with high empirical downstream performance.…”