<p>The dynamic and thermodynamic drivers of spatially extended climate extremes are difficult to disentangle due to the earth system&#8217;s the high dimensionality, interconnectedness and non-linear relationships between individual processes. One approach in the literature relies on carefully selected case studies and uses dynamical models to obtain physical insights into the development of individual extreme episodes. Other studies focus on the statistical relationship between a class of extreme events and one specific driving mechanism. We aim to complement both of these approaches with a machine learning framework in three steps: firstly, the dimensionality of the predictand and a wide range of potential predictor variables is reduced using an appropriate change of basis functions. Secondly, their relationship is modeled by a statistical learner of intermediate complexity &#8212; powerful enough to represent the non-linear relationships but simple enough to allow for fast training and extensive experimentation. Lastly, the contribution of each variable to the overall model performance, and to the representation of individual events is assessed using recently developed methods of explainable machine learning.</p><p>As a first example application, we model European heatwaves in ERA5 data based on potential explanatory variables including geopotential, sea level pressure, soil moisture and the wind components of the jet stream. The predictors are summarized by classic principal component analysis (PCA); for the heatwave fields we rely on a specialized PCA for binary data. A simple neural network is capable of representing a large part of the variability in the reduced space. With the help of Shapley values, we can then quantitatively asses how much information on heatwaves is contained in each variable, and how individual heatwave events differ in terms of the variables by which the model recognizes them. This type of explanation explicitly allows for predictors with overlapping information and nonlinear interactions.&#160; One advantage of our framework is its ability to represent the impact of all suspected drivers on any arbitrarily defined binary event in a single model.</p>