“…49 (2) The second group of data sets, referred to as "TDC", comprised 12 data sets that featured pharmaco-kinetic and toxicological properties and were obtained from the TDC: Caco2_Wang, 52 Lipophilicity_AstraZeneca, 53,54 Solubili-ty_AqSolDB, 5 5 HydrationFreeEnergy_FreeSolv, 5 6 PPBR_AZ, 54 VDss_Lombardo, 57 Half_Life_Obach, 58 Clear-ance_Hepatocyte_AZ, 54,59 Clearance_Microsome_AZ, 54,59 LD50_Zhu, 60 herg_central/hERG_at_1uM, 61 and herg_central/hERG_at_10uM. 61 (3) The third group of data sets, referred to as "ChEMBL", comprised 30 SAR data sets from ChEMBL 62 that were curated by van Tilborg et al 11 To reduce the computational cost of performing these tests, data set sizes were capped at 10,000 molecules; data sets containing a larger number of entries were subsampled at random (using a fixed seed for reproducibility).…”