“…Specifically, most predictive models comprise hundreds to thousands of small chemistry datasets that cannot cover enough chemical space [76,118,188]. Moreover, the data is usually dispersed to many literatures [117,118,122,124,125,[128][129][130][134][135][136][137][138][139][140][141][142][143][144]150,155,[159][160][161][162][163][164][165][166][167][168][169]173,176,179,183], is unbalanced, and has cutoff ambiguity challenges [118,167]. Furthermore, the bioactivity assay data is strongly biased to its platform, and has an intrinsic experimental error which disrupts accurate prediction [189].…”