Artificial intelligence, in particular machine learning (ML), has emerged as a key promising pillar to overcome the high failure rate in drug development. Here, we present a primer on the ML algorithms most commonly used in drug discovery and development. We also list possible data sources, describe good practices for ML model development and validation, and share a reproducible example. A companion article will summarize applications of ML in drug discovery, drug development, and postapproval phase.
BackgroundThe recent determination of the complete nucleotide sequence of several Mycobacterium tuberculosis (MTB) genomes allows the use of comparative genomics as a tool for dissecting the nature and consequence of genetic variability within this species. The multiple alignment of the genomes of clinical strains (CDC1551, F11, Haarlem and C), along with the genomes of laboratory strains (H37Rv and H37Ra), provides new insights on the mechanisms of adaptation of this bacterium to the human host.FindingsThe genetic variation found in six M. tuberculosis strains does not involve significant genomic rearrangements. Most of the variation results from deletion and transposition events preferentially associated with insertion sequences and genes of the PE/PPE family but not with genes implicated in virulence. Using a Perl-based software islandsanalyser, which creates a representation of the genetic variation in the genome, we identified differences in the patterns of distribution and frequency of the polymorphisms across the genome. The identification of genes displaying strain-specific polymorphisms and the extrapolation of the number of strain-specific polymorphisms to an unlimited number of genomes indicates that the different strains contain a limited number of unique polymorphisms.ConclusionThe comparison of multiple genomes demonstrates that the M. tuberculosis genome is currently undergoing an active process of gene decay, analogous to the adaptation process of obligate bacterial symbionts. This observation opens new perspectives into the evolution and the understanding of the pathogenesis of this bacterium.
Chagas disease is a neglected tropical disease endemic to Latin America, though migratory movements have recently spread it to other regions. Here, we have applied a cascade virtual screening campaign combining ligand- and structure-based methods. In order to find novel inhibitors of putrescine uptake in Trypanosoma cruzi, an ensemble of linear ligand-based classifiers obtained by has been applied as initial screening filter, followed by docking into a homology model of the putrescine permease TcPAT12. 1,000 individual linear classifiers were inferred from a balanced dataset. Subsequently, different schemes were tested to combine the individual classifiers: MIN operator, average ranking, average score, average voting, with MIN operator leading to the best performance. The homology model was based on the arginine/agmatine antiporter (AdiC) from Escherichia coli as template. It showed 64% coverage of the entire query sequence and it was selected based on the normalized Discrete Optimized Protein Energy parameter and the GA341 score. The modeled structure had 96% in the allowed area of Ramachandran's plot, and none of the residues located in non-allowed regions were involved in the active site of the transporter. Positivity Predictive Value surfaces were applied to optimize the score thresholds to be used in the ligand-based virtual screening step: for that purpose Positivity Predictive Value was charted as a function of putative yields of active in the range 0.001–0.010 and the Se/Sp ratio. With a focus on drug repositioning opportunities, DrugBank and Sweetlead databases were subjected to screening. Among 8 hits, cinnarizine, a drug frequently prescribed for motion sickness and balance disorder, was tested against T. cruzi epimastigotes and amastigotes, confirming its trypanocidal effects and its inhibitory effects on putrescine uptake. Furthermore, clofazimine, an antibiotic with already proven trypanocidal effects, also displayed inhibitory effects on putrescine uptake. Two other hits, meclizine and butoconazole, also displayed trypanocidal effects (in the case of meclizine, against both epimastigotes and amastigotes), without inhibiting putrescine uptake.
Early clinical trials of therapies to treat Duchenne muscular dystrophy (DMD), a fatal genetic X-linked pediatric disease, have been designed based on the limited understanding of natural disease progression and variability in clinical measures over different stages of the continuum of the disease. The objective was to inform the design of DMD clinical trials by developing a disease progression modelbased clinical trial simulation (CTS) platform based on measures commonly used in DMD trials. Data were integrated from past studies through the Duchenne Regulatory Science Consortium founded by the Critical Path Institute (15 clinical trials and studies, 1505 subjects, 27,252 observations). Using a nonlinear mixedeffects modeling approach, longitudinal dynamics of five measures were modeled (NorthStar Ambulatory Assessment, forced vital capacity, and the velocities of the following three timed functional tests: time to stand from supine, time to climb 4 stairs, and 10 meter walk-run time). The models were validated on external data sets and captured longitudinal changes in the five measures well, including both early disease when function improves as a result of growth and development and the decline in function in later stages. The models can be used in the CTS platform to perform trial simulations to optimize the selection of inclusion/ exclusion criteria, selection of measures, and other trial parameters. The data sets and models have been reviewed by the US Food and Drug Administration and the European Medicines Agency; have been accepted into the Fit-for-Purpose and Qualification for Novel Methodologies pathways, respectively; and will be submitted for potential endorsement by both agencies.
Current medical treatments against recurrent pulmonary infections caused by Pseudomonas aeruginosa, such as cystic fibrosis (CF) disorder, involve the administration of inhalable antibiotics.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.