In the field of hazard assessment, Benchmark concentrations (BMC) and their associated uncertainty are of particular interest for regulatory decision making. The BMC estimation consists of various statistical decisions to be made, which depend largely on factors such as experimental design and assay endpoint features. In current data practice, the experimenter is often responsible for the data analysis and therefore relies on statistical software without being aware about the software default settings and how they can impact the outputs of data analysis. To provide more insight into how statistical decision making can influence the outcomes of data analysis and interpretation, we have used case studies on a large dataset produced by a developmental neurotoxicity (DNT) in vitro battery (DNT IVB). Here we focused on the BMC and its confidence interval (CI) estimation, as well as on the final hazard classification. We identified five crucial statistical decisions experimenter have to face during data analysis: choice of replicate averaging, response data normalization, regression modelling, BMC and CI estimation, as well as choice of benchmark response levels. In addition, the strength of our data evaluation platform is the integration of endpoint-specific hazard classifications, including flagging systems for uncertain cases, which none of the so far existing statistical data analysis platforms provide. The insights gained in this study demonstrate how important fit-for-purpose, internationally harmonized and accepted data evaluation and analysis procedures are for an objective hazard classification.