“…However, there is a growing awareness in the community that the presence of different sources of bias significantly decreases the overall generalisation ability of the models, leading to overestimated model performance reported in internal validation compared to evaluation on independent test data ( Soneson, Gerster, Delorenzi, 2014 , Cohen, Hashir, Brooks, Bertrand , Zech, Badgeley, Liu, Costa, Titano, Oermann, 2018 , Maguolo, Nanni ). In addition, numerous journal editorials are calling for better development, evaluation and reporting practices of machine learning models aimed for clinical application ( Mateen, Liley, Denniston, Holmes, Vollmer, 2020 , Nagendran, Chen, Lovejoy, Gordon, Komorowski, Harvey, Topol, Ioannidis, Collins, Maruthappu, 2020 , Campbell, Lee, Abrmoff, Keane, Ting, Lum, Chiang, 2020 , Health, 2020 , O’Reilly-Shah, Gentry, Walters, Zivot, Anderson, Tighe, 2020 , Health, 2019 , Stevens, Mortazavi, Deo, Curtis, Kao, 2020 ). Underneath, there are growing concerns about ethics and the risk of harmful outcomes of using AI in medical applications ( Campolo, Sanfilippo, Whittaker, Crawford, 2018 , Geis, Brady, Wu, Spencer, Ranschaert, Jaremko, Langer, Borondy Kitts, Birch, Shields, van den Hoven van Genderen, Kotter, Wawira Gichoya, Cook, Morgan, Tang, Safdar, Kohli, 2019 , Brady, Neri, 2020 ).…”