Powerful telescopes equipped with multi-fibre or integral field spectrographs combined with detailed models of stellar atmospheres and automated fitting techniques allow for the analysis of large number of stars. These datasets contain a wealth of information that require new analysis techniques to bridge the gap between observations and stellar evolution models. To that end, we develop B (BONN Stellar Astrophysics Interface), a Bayesian statistical method, that is capable of comparing all available observables simultaneously to stellar models while taking observed uncertainties and prior knowledge such as initial mass functions and distributions of stellar rotational velocities into account. B can be used to (1) determine probability distributions of fundamental stellar parameters such as initial masses and stellar ages from complex datasets; (2) predict stellar parameters that were not yet observationally determined; and (3) test stellar models to further advance our understanding of stellar evolution. An important aspect of B is that it singles out stars that cannot be reproduced by stellar models through χ 2 hypothesis tests and posterior predictive checks. B can be used with any set of stellar models and currently supports massive main-sequence single star models of Milky Way and Large and Small Magellanic Cloud composition. We apply our new method to mock stars to demonstrate its functionality and capabilities. In a first application, we use B to test the stellar models of Brott et al. (2011, A&A, 530, A115) by comparing the stellar ages inferred for the primary and secondary stars of eclipsing Milky Way binaries of which the components range in mass between 4.5 and 28 M . Ages are determined from dynamical masses and radii that are known to better than 3%. We show that the stellar models must include rotation because stellar radii can be increased by several percent via centrifugal forces. We find that the average age difference between the primary and secondary stars of the binaries is 0.9 ± 2.3 Myr (95% CI), i.e. that the stellar models reproduce the Milky Way binaries well. The predicted effective temperatures are in agreement for observed effective temperatures for stars cooler than 25 000 K. In hotter stars, i.e. stars earlier than B1-2V and more massive than about 10 M , we find that the observed effective temperatures are on average hotter by 1.1 ± 0.3 kK (95% CI) and the bolometric luminosities are consequently larger by 0.06 ± 0.02 dex (95% CI) than predicted by the stellar models.