Drug discovery projects in the pharmaceutical industry accumulate thousands of chemical structures and ten-thousands of data points from a dozen or more biological and pharmacological assays. A sufficient interpretation of the data requires understanding, which molecular families are present, which structural motifs correlate with measured properties, and which tiny structural changes cause large property changes. Data visualization and analysis software with sufficient chemical intelligence to support chemists in this task is rare. In an attempt to contribute to filling the gap, we released our in-house developed chemistry aware data analysis program DataWarrior for free public use. This paper gives an overview of DataWarrior's functionality and architecture. Exemplarily, a new unsupervised, 2-dimensional scaling algorithm is presented, which employs vector-based or nonvector-based descriptors to visualize the chemical or pharmacophore space of even large data sets. DataWarrior uses this method to interactively explore chemical space, activity landscapes, and activity cliffs.
We present OSIRIS, an entirely in-house developed drug discovery informatics system. Its components cover all information handling aspects from compound synthesis via biological testing to preclinical development. Its design principles are platform and vendor independence, a consistent look and feel, and complete coverage of the drug discovery process by custom tailored applications. These include electronic laboratory notebook applications for biology and chemistry, tools for high-throughput and secondary screening evaluation, chemistry-aware data visualization, physicochemical property prediction, 3D-pharmacophore comparisons, interactive modeling, computing grid based ligand-protein docking, and more. Most applications are developed in Java and are built on top of a Java library layer that provides reusable cheminformatics functionality and GUI components such as chemical editors, structure canonicalization, substructure search, combinatorial enumeration, enhanced stereo perception, force field minimization, and conformation generation.
Several in-house developed descriptors and our in-house docking tool ActDock were compared with virtual screening on the data set of useful decoys (DUD). The results were compared with the chemical fingerprint descriptor from ChemAxon and with the docking results of the original DUD publication. The DUD is the first published data set providing active molecules, decoys, and references for crystal structures of ligand-target complexes. The DUD was designed for the purpose of evaluating docking programs. It contains 2950 active compounds against a total of 40 target proteins. Furthermore, for every ligand the data set contains 36 structurally dissimilar decoy compounds with similar physicochemical properties. We extracted the ligands from the target proteins to extend the applicability of the data set to include ligand based virtual screening. From the 40 target proteins, 37 contained ligands that we used as query molecules for virtual screening evaluation. With this data set a large comparison was done between four different chemical fingerprints, a topological pharmacophore descriptor, the Flexophore descriptor, and ActDock. The Actelion docking tool relies on a MM2 forcefield and a pharmacophore point interaction statistic for scoring; the details are described in this publication. In terms of enrichment rates the chemical fingerprint descriptors performed better than the Flexophore and the docking tool. After removing molecules chemically similar to the query molecules the Flexophore descriptor outperformed the chemical descriptors and the topological pharmacophore descriptors. With the similarity matrix calculations used in this study it was shown that the Flexophore is well suited to find new chemical entities via "scaffold hopping". The Flexophore descriptor can be explored with a Java applet at http://www.cheminformatics.ch in the submenu Tools-->Flexophore. Its usage is free of charge and does not require registration.
A novel pharmacophore descriptor Flexophore is presented, which considers molecular flexibility when comparing descriptor similarities. The descriptor is a complete reduced graph of the underlying molecule. Its nodes are represented by enhanced MM2 atom types, while the edge descriptions encode the molecular flexibility by means of a histogram of node distances in a diverse conformer distribution. For comparing two descriptor nodes, a statistical function derived from the Cambridge Crystallographic Database is implemented. To assess the capability of the descriptor to describe the bioactivity space, 350 test data sets with 1000 molecules each are compiled. The data sets were spiked with molecules active on one of 18 different targets. In 175 of the 350 data sets, all molecules chemically similar to the query molecules were removed. Virtual screening on these data sets showed that the Flexophore descriptor detects active molecules despite chemical dissimilarity, whereas the results for the screening of the complete data sets show enrichments comparable to chemical fingerprint descriptors. The diversity analysis of the enriched compounds demonstrates that the Flexophore descriptor describes the chemical space orthogonal to chemical fingerprint descriptors.
The Platinum dataset of protein-bound ligand conformations was used to benchmark the ability of the MMFF94s force field to generate bioactive conformations by minimization of randomly generated conformers. Torsion angle parameters that generally caused wrong geometries were reparameterized by conducting dihedral scans using ab initio calculations at the MP2 level. This reparameterization resulted in a systematic improvement of generated conformations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.