“…In an alternative formulation, mixture density networks (MDNs) output a probability distribution that is defined by a sum of analytic pdfs called kernels, such as Gaussian distributions, and can be trained to map data to corresponding posterior pdfs (Bishop, 2006). MDNs have been applied to surface wave dispersion inversion (Cao et al., 2020; Earp et al., 2020; Meier et al., 2007a, 2007b), two‐dimensional (2D) travel time tomography (Earp & Curtis, 2020), petrophysical inversion (Shahraeeni & Curtis, 2011; Shahraeeni et al., 2012), earthquake source parameter estimation (Käufl et al., 2014, 2015), Earth's radial seismic structure inversion (de Wit et al., 2013), pore pressure prediction (Karmakar & Maiti, 2019), mapping of lithology (Karmakar et al., 2018), wind prediction (Men et al., 2016), acoustic‐articulatory inversion (Richmond, 2007) and nuclei detection (Koohababni et al., 2018). However MDNs become difficult to train in high dimensionality because of numerical instability, and they suffer from mode collapse which means that some modes (maxima) of the posterior pdf are missing in the results (Cui et al., 2019; Curro & Raquet, 2018; Hjorth & Nabney, 1999; Makansi et al., 2019; Rupprecht et al., 2017).…”