Current pharmaceutical formulation development still strongly relies on the traditional trial-and-error methods of pharmaceutical scientists. This approach is laborious, time-consuming and costly. Recently, deep learning has been widely applied in many challenging domains because of its important capability of automatic feature extraction. The aim of the present research is to apply deep learning methods to predict pharmaceutical formulations. In this paper, two types of dosage forms were chosen as model systems. Evaluation criteria suitable for pharmaceutics were applied to assess the performance of the models. Moreover, an automatic dataset selection algorithm was developed for selecting the representative data as validation and test datasets. Six machine learning methods were compared with deep learning. Results showed that the accuracies of both two deep neural networks were above 80% and higher than other machine learning models; the latter showed good prediction of pharmaceutical formulations. In summary, deep learning employing an automatic data splitting algorithm and the evaluation criteria suitable for pharmaceutical formulation data was developed for the prediction of pharmaceutical formulations for the first time. The cross-disciplinary integration of pharmaceutics and artificial intelligence may shift the paradigm of pharmaceutical research from experience-dependent studies to data-driven methodologies.
Note: Zhuyifan Ye and Yilong Yang made equal contribution to the manuscript.
ABSTRACT:Background: Pharmacokinetic evaluation is one of the key processes in drug discovery and development.However, current absorption, distribution, metabolism, excretion prediction models still have limited accuracy.Aim: This study aims to construct an integrated transfer learning and multitask learning approach for developing quantitative structure-activity relationship models to predict four human pharmacokinetic parameters.
Methods:A pharmacokinetic dataset included 1104 U.S. FDA approved small molecule drugs. The dataset included four human pharmacokinetic parameter subsets (oral bioavailability, plasma protein binding rate, apparent volume of distribution at steady-state and elimination half-life). The pre-trained model was trained on over 30 million bioactivity data. An integrated transfer learning and multitask learning approach was established to enhance the model generalization. Results: The pharmacokinetic dataset was split into three parts (60:20:20) for training, validation and test by the improved Maximum Dissimilarity algorithm with the representative initial set selection algorithm and the weighted distance function. The multitask learning techniques enhanced the model predictive ability. The integrated transfer learning and multitask learning model demonstrated the best accuracies, because deep neural networks have the general feature extraction ability, transfer learning and multitask learning improved the model generalization.
Conclusions:The integrated transfer learning and multitask learning approach with the improved dataset splitting algorithm was firstly introduced to predict the pharmacokinetic parameters. This method can be further employed in drug discovery and development.
Most pharmaceutical formulation developments are complex and ideal formulations are generally obtained after extensive experimentation. Machine learning is increasingly advancing many aspects in modern society and has achieved significant success in multiple subjects. Current research demonstrated that machine learning can be adopted to build up high-accurate predictive models in drugs/cyclodextrins (CDs) systems. Molecular descriptors of compounds and experimental conditions were employed as inputs, while complexation free energy as outputs. Results showed that the light gradient boosting machine provided significantly improved predictive performance over random forest and deep learning. The mean absolute error was 1.38 kJ/mol and squared correlation coefficient was 0.86. The evaluation of relative importance of molecular descriptors further demonstrated the key factors affecting molecular interactions in drugs/CD systems. In the specific ketoprofen–CD systems, machine learning model showed better predictive performance than molecular modeling calculation, while molecular simulation could provide structural, dynamic and energetic information. The integration of machine learning and molecular simulation could produce synergistic effect for interpreting and predicting pharmaceutical formulations. In conclusion, the developed predictive models were able to quickly and accurately predict the solubilizing capacity of CD systems. Current research has taken an important step toward the application of machine learning in pharmaceutical formulation design.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.