Motivation
In silico approaches often fail to utilize bioactivity data available for orthologous targets due to insufficient evidence highlighting the benefit for such an approach. Deeper investigation into orthologue chemical space and its influence toward expanding compound and target coverage is necessary to improve the confidence in this practice.ResultsHere we present analysis of the orthologue chemical space in ChEMBL and PubChem and its impact on target prediction. We highlight the number of conflicting bioactivities between human and orthologues is low and annotations are overall compatible. Chemical space analysis shows orthologues are chemically dissimilar to human with high intra-group similarity, suggesting they could effectively extend the chemical space modelled. Based on these observations, we show the benefit of orthologue inclusion in terms of novel target coverage. We also benchmarked predictive models using a time-series split and also using bioactivities from Chemistry Connect and HTS data available at AstraZeneca, showing that orthologue bioactivity inclusion statistically improved performance.Availability and implementationOrthologue-based bioactivity prediction and the compound training set are available at www.github.com/lhm30/PIDGINv2.Supplementary information
Supplementary data are available at Bioinformatics online.