For more than two decades, scientists have been trying to replace the regulatory in vivo Draize eye test by in vitro methods, but so far only partial replacement has been achieved. In order to better understand the reasons for this, historical in vivo rabbit data were analysed in detail and resampled with the purpose of (1) revealing which of the in vivo endpoints are most important in driving United Nations Globally Harmonized System/European Union Regulation on Classification, Labelling and Packaging (UN GHS/EU CLP) classification for serious eye damage/eye irritation and (2) evaluating the method’s within-test variability for proposing acceptable and justifiable target values of sensitivity and specificity for alternative methods and their combinations in testing strategies. Among the Cat 1 chemicals evaluated, 36–65 % (depending on the database) were classified based only on persistence of effects, with the remaining being classified mostly based on severe corneal effects. Iritis was found to rarely drive the classification (<4 % of both Cat 1 and Cat 2 chemicals). The two most important endpoints driving Cat 2 classification are conjunctiva redness (75–81 %) and corneal opacity (54–75 %). The resampling analyses demonstrated an overall probability of at least 11 % that chemicals classified as Cat 1 by the Draize eye test could be equally identified as Cat 2 and of about 12 % for Cat 2 chemicals to be equally identified as No Cat. On the other hand, the over-classification error for No Cat and Cat 2 was negligible (<1 %), which strongly suggests a high over-predictive power of the Draize eye test. Moreover, our analyses of the classification drivers suggest a critical revision of the UN GHS/EU CLP decision criteria for the classification of chemicals based on Draize eye test data, in particular Cat 1 based only on persistence of conjunctiva effects or corneal opacity scores of 4. In order to successfully replace the regulatory in vivo Draize eye test, it will be important to recognise these uncertainties and to have in vitro tools to address the most important in vivo endpoints identified in this paper.