The question of whether amplitude scaling of ground motion (GM) records introduces bias is controversial in earthquake engineering. In this study, this research question is formally defined with a focus on evaluating bias in determining an engineering demand parameter (EDP) as a result of nonlinear response history analyses (NLRHA) when using scaled rather than unscaled GM that have the same level of intensity. The analysed structures are 10 planar steel frame buildings ranging from low-to high-rise, where the EDP of interest is the maximum interstory drift ratio, while the structural responses range from linear to collapse. A unique contribution is the depth of the research, which employs an unprecedented number of more than 17,000 recorded GM, resulting in approximately 3.4 million NLRHA. For a thorough investigation, the most relevant intensity measures are discussed and considered, as well as novel spectral information describing the sustained vibration amplitude. The introduction of bias is examined from different points of view, using first simple and intuitive statistical methods, then machine learning techniques, and finally a novel GM selection approach. In the numerous investigations, no bias could be detected under the inherent uncertainty of the calculations. The results indicate that scaled records can be safely used in NLRHA to assess the seismic structural behaviour if spectral and scenario compatibility are ensured and it is verified that the sustained amplitude is also consistent.