We examine the performance of standard pre-main-sequence (PMS) stellar evolution models against the accurately measured properties of a benchmark sample of 26 PMS stars in 13 eclipsing binary (EB) systems having masses 0.04-4.0 M ⊙ and nominal ages ≈1-20 Myr. We provide a definitive compilation of all fundamental properties for the EBs, with a careful and consistent reassessment of observational uncertainties. We also provide a definitive compilation of the various PMS model sets, including physical ingredients and limits of applicability. No set of model isochrones is able to successfully reproduce all of the measured properties of all of the EBs. In the H-R diagram, the masses inferred for the individual stars by the models are accurate to better than 10% at 1 M ⊙ , but below 1 M ⊙ they are discrepant by 50-100%. Adjusting the observed radii and temperatures using empirical relations for the effects of magnetic activity helps to resolve the discrepancies in a few cases, but fails as a general solution. We find evidence that the failure of the models to match the data is linked to the triples in the EB sample; at least half of the EBs possess tertiary companions. Excluding the triples, the models reproduce the stellar masses to better than ∼10% in the H-R diagram, down to 0.5 M ⊙ , below which the current sample is fully contaminated by tertiaries. We consider several mechanisms by which a tertiary might cause changes in the EB properties and thus corrupt the agreement with stellar model predictions. We show that the energies of the tertiary orbits are comparable to that needed to potentially explain the scatter in the EB properties through injection of heat, perhaps involving tidal interaction. It seems from the evidence at hand that this mechanism, however it operates in detail, has more influence on the surface properties of the stars than on their internal structure, as the lithium abundances are broadly in good agreement with model predictions. The EBs that are members of young clusters appear individually coeval to within 20%, but collectively show an apparent age spread of ∼50%, suggesting true age spreads in young clusters. However, this apparent spread in the EB ages may also be the result of scatter in the EB properties induced by tertiaries.