The Wasserstein metric is introduced as a probabilistic method to enable quantitative evaluations of LES combustion models. The Wasserstein metric can directly be evaluated from scatter data or statistical results using probabilistic reconstruction against experimental data. The method is derived and generalized for turbulent reacting flows, and applied to validation tests involving the Sydney piloted jet flame. It is shown that the Wasserstein metric is an effective validation tool that extends to multiple scalar quantities, providing an objective and quantitative evaluation of model deficiencies and boundary conditions on the simulation accuracy. Several test cases are considered, beginning with a comparison of mixture-fraction results, and the subsequent extension to reactive scalars, including temperature and species mass fractions of CO and CO 2 . To demonstrate the versatility of the proposed method in application to multiple datasets, the Wasserstein metric is applied to a series of different simulations that were contributed to the TNF-workshop. Analysis of the results allowed to identify competing contributions to model deviations, arising from uncertainties in the boundary conditions and model deficiencies. These applications demonstrate that the Wasserstein metric constitutes an easily applicable mathematical tool that reduce multiscalar combustion data and large datasets into a scalar-valued quantitative measure.