Physicochemical properties are fundamental to predict the pharmacokinetic and pharmacodynamic behavior of drug candidates. Easily calculated descriptors such as molecular weight and logP have been found to correlate with the success rate of clinical trials. These properties have been previously shown to highlight a sweet-spot in the chemical space associated with favorable pharmacokinetics, which is superior against other regions during hit identification and optimization. In this study, we applied self-organizing maps (SOMs) trained on sixteen calculated properties of a subset of known drugs for the analysis of commercially available compound databases, as well as public biological and chemical databases frequently used for drug discovery. Interestingly, several regions of the property space have been identified that are highly overrepresented by commercially available chemical libraries, while we found almost completely unoccupied regions of the maps (commercially neglected chemical space resembling the properties of known drugs). Moreover, these underrepresented portions of the chemical space are compatible with most rigorous property filters applied by the pharma industry in medicinal chemistry optimization programs. Our results suggest that SOMs may be directly utilized in the strategy of library design for drug discovery to sample previously unexplored parts of the chemical space to aim at yet-undruggable targets.
Graphic abstract