Pharmaceutical industry and the art and science of drug development are sorely in need of novel transformative technologies in the current age of digital health and artificial intelligence (AI). Often described as gamechanging technologies, AI and machine learning algorithms have slowly but surely begun to revolutionize pharmaceutical industry and drug development over the past 5 years. In this expert review, we describe the most frequently used machine learning algorithms in drug development pipelines and the -omics databases well poised to support machine learning and drug discovery. Subsequently, we analyze the emerging new computational approaches to drug discovery and the in silico pipelines for drug repositioning and the synergies among -omics system sciences, AI and machine learning. As with system sciences, AI and machine learning embody a system scale and Big Data driven vision for drug discovery and development. We conclude with a future outlook on the ways in which machine learning approaches can be implemented to buttress and expedite drug discovery and precision medicine. As AI and machine learning are rapidly entering pharmaceutical industry and the art and science of drug development, we need to critically examine the attendant prospects and challenges to benefit patients and public health.
Background The field of pharmacogenomics focuses on the way a person’s genome affects his or her response to a certain dose of a specified medication. The main aim is to utilize this information to guide and personalize the treatment in a way that maximizes the clinical benefits and minimizes the risks for the patients, thus fulfilling the promises of personalized medicine. Technological advances in genome sequencing, combined with the development of improved computational methods for the efficient analysis of the huge amount of generated data, have allowed the fast and inexpensive sequencing of a patient’s genome, hence rendering its incorporation into clinical routine practice a realistic possibility. Methods This study exploited thoroughly characterized in functional level SNVs within genes involved in drug metabolism and transport, to train a classifier that would categorize novel variants according to their expected effect on protein functionality. This categorization is based on the available in silico prediction and/or conservation scores, which are selected with the use of recursive feature elimination process. Toward this end, information regarding 190 pharmacovariants was leveraged, alongside with 4 machine learning algorithms, namely AdaBoost, XGBoost, multinomial logistic regression, and random forest, of which the performance was assessed through 5-fold cross validation. Results All models achieved similar performance toward making informed conclusions, with RF model achieving the highest accuracy (85%, 95% CI: 0.79, 0.90), as well as improved overall performance (precision 85%, sensitivity 84%, specificity 94%) and being used for subsequent analyses. When applied on real world WGS data, the selected RF model identified 2 missense variants, expected to lead to decreased function proteins and 1 to increased. As expected, a greater number of variants were highlighted when the approach was used on NGS data derived from targeted resequencing of coding regions. Specifically, 71 variants (out of 156 with sufficient annotation information) were classified as to “Decreased function,” 41 variants as “No” function proteins, and 1 variant in “Increased function.” Conclusion Overall, the proposed RF-based classification model holds promise to lead to an extremely useful variant prioritization and act as a scoring tool with interesting clinical applications in the fields of pharmacogenomics and personalized medicine.
FINDbase (http://www.findbase.org) is a comprehensive data resource recording the prevalence of clinically relevant genomic variants in various populations worldwide, such as pathogenic variants underlying genetic disorders as well as pharmacogenomic biomarkers that can guide drug treatment. Here, we report significant new developments and technological advancements in the database architecture, leading to a completely revamped database structure, querying interface, accompanied with substantial extensions of data content and curation. In particular, the FINDbase upgrade further improves the user experience by introducing responsive features that support a wide variety of mobile and stationary devices, while enhancing computational runtime due to the use of a modern Javascript framework such as ReactJS. Data collection is significantly enriched, with the data records being divided in a Public and Private version, the latter being accessed on the basis of data contribution, according to the microattribution approach, while the front end was redesigned to support the new functionalities and querying tools. The abovementioned updates further enhance the impact of FINDbase, improve the overall user experience, facilitate further data sharing by microattribution, and strengthen Fotios Kounelis, Alexandros Kanterakis, and Andreas Kanavos contributed equally to this study. How to cite this article: Kounelis F, Kanterakis A, Kanavos A, et al. Documentation of clinically relevant genomic biomarker allele frequencies in the next-generation FINDbase worldwide database.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.