Although monoclonal antibodies (mAbs) have been shown to be extremely effective in treating a number of diseases, they often suffer from poor developability attributes, such as high viscosity and low solubility at elevated concentrations. Since experimental candidate screening is often materials and labor intensive, there is substantial interest in developing in silico tools for expediting mAb design. Here, we present a strategy using machine learning-based QSAR models for the a priori estimation of mAb solubility. The extrapolated protein solubilities of a set of 111 antibodies in a histidine buffer were determined using a high throughput PEG precipitation assay. 3D homology models of the antibodies were determined, and a large set of in house and commercially available molecular descriptors were then calculated. The resulting experimental and descriptor data were then used for the development of QSAR models of mAb solubilities. After feature selection and training with different machine learning algorithms, the models were evaluated with external test sets. The resulting regression models were able to estimate the solubility values of external test set data with R 2 of 0.81 and 0.85 for the two regression models developed. In addition, three class and binary classification models were developed and shown to be good estimators of mAb solubility behavior, with overall test set accuracies of 0.70 and 0.95, respectively. The analysis of the selected molecular descriptors in these models was also found to be informative and suggested that several charge-based descriptors and isotype may play important roles in mAb solubility. The combination of high throughput relative solubility experimental techniques in concert with efficient machine learning QSAR models offers an opportunity to rapidly screen potential mAb candidates and to design therapeutics with improved solubility characteristics.
Immunoglobulin G-based monoclonal antibodies (mAbs) have become a dominant class of biotherapeutics in recent decades. Approved antibodies are mainly of the subclasses IgG1, IgG2, and IgG4, as well as their derivatives. Over the decades, the selection of IgG subclass has frequently been based on the needs of Fc gamma receptor engagement and effector functions for the desired mechanism of action, while the effect on drug product developability has been less thoroughly characterized. One of the major reasons is the lack of systematic understanding of the impact of IgG subclass on the molecular properties. Several efforts have been made recently to compare molecular property differences among these IgG subclasses, but the conclusions from these studies are sometimes obscured by the interference from variable regions. To further establish mechanistic understandings, we conducted a systematic study by grafting three independent variable regions onto human IgG1, an IgG1 variant, IgG2, and an IgG4 variant constant domains and evaluating the impact of subclass and variable regions on their molecular properties. Structural and computational analysis revealed specific molecular features that potentially account for the differential behavior of the IgG subclasses observed experimentally. Our data indicate that IgG subclass plays a significant role on molecular properties, either through direct effects or via the interplay with the variable region, the IgG1 mAbs tend to have higher solubility than either IgG2 or IgG4 mAbs in a common pH 6 buffer matrix, and solution behavior relies heavily on the charge status of the antibody at the desirable pH.
There is growing interest in developing therapeutic mAbs for the route of subcutaneous administration for several reasons, including patient convenience and compliance. This requires identifying mAbs with superior solubility that are amenable for high-concentration formulation development. However, early selection of developable antibodies with optimal high-concentration attributes remains challenging. Since experimental screening is often material and labor intensive, there is significant interest in developing robust in silico tools capable of screening thousands of molecules based on sequence information alone. In this paper, we present a strategy applying protein language modeling, named solPredict, to predict the apparent solubility of mAbs in histidine (pH 6.0) buffer condition. solPredict inputs embeddings extracted from pretrained protein language model from single sequences into a shallow neutral network. A dataset of 220 diverse, in-house mAbs, with extrapolated protein solubility data obtained from PEG-induced precipitation method, were used for model training and hyperparameter tuning through five-fold cross validation. An independent test set of 40 mAbs were used for model evaluation. solPredict achieves high correlation with experimental data (Spearman correlation coefficient = 0.86, Pearson correlation coefficient = 0.84, R2 = 0.69, and RMSE = 4.40). The output from solPredict directly corresponds to experimental solubility measurements (PEG %) and enables quantitative interpretation of results. This approach eliminates the need of 3D structure modeling of mAbs, descriptor computation, and expert-crafted input features. The minimal computational expense of solPredict enables rapid, large-scale, and high-throughput screening of mAbs during early antibody discovery.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.