Motivation Antibodies are a class of proteins capable of specifically recognizing and binding to a virtually infinite number of antigens. This binding malleability makes them the most valuable category of biopharmaceuticals for both diagnostic and therapeutic applications. The correct identification of the antigen-binding residues in the antibody is crucial for all antibody design and engineering techniques and could also help to understand the complex antigen binding mechanisms. However, the antibody-binding interface prediction field appears to be still rather underdeveloped. Results We present a novel method for antibody interface prediction from their experimentally solved structures based on 3D Zernike Descriptors. Roto-translationally invariant descriptors are computed from circular patches of the antibody surface enriched with a chosen subset of physico-chemical properties from the AAindex1 amino acid index set, and are used as samples for a binary classification problem. An SVM classifier is used to distinguish interface surface patches from non-interface ones. The proposed method was shown to outperform other antigen-binding interface prediction software. Availability and implementation Linux binaries and Python scripts are available at https://github.com/sebastiandaberdaku/AntibodyInterfacePrediction. The datasets generated and/or analyzed during the current study are available at https://doi.org/10.6084/m9.figshare.5442229. Supplementary information Supplementary data are available at Bioinformatics online.
BackgroundThe correct determination of protein–protein interaction interfaces is important for understanding disease mechanisms and for rational drug design. To date, several computational methods for the prediction of protein interfaces have been developed, but the interface prediction problem is still not fully understood. Experimental evidence suggests that the location of binding sites is imprinted in the protein structure, but there are major differences among the interfaces of the various protein types: the characterising properties can vary a lot depending on the interaction type and function. The selection of an optimal set of features characterising the protein interface and the development of an effective method to represent and capture the complex protein recognition patterns are of paramount importance for this task.ResultsIn this work we investigate the potential of a novel local surface descriptor based on 3D Zernike moments for the interface prediction task. Descriptors invariant to roto-translations are extracted from circular patches of the protein surface enriched with physico-chemical properties from the HQI8 amino acid index set, and are used as samples for a binary classification problem. Support Vector Machines are used as a classifier to distinguish interface local surface patches from non-interface ones. The proposed method was validated on 16 classes of proteins extracted from the Protein–Protein Docking Benchmark 5.0 and compared to other state-of-the-art protein interface predictors (SPPIDER, PrISE and NPS-HomPPI).ConclusionsThe 3D Zernike descriptors are able to capture the similarity among patterns of physico-chemical and biochemical properties mapped on the protein surface arising from the various spatial arrangements of the underlying residues, and their usage can be easily extended to other sets of amino acid properties. The results suggest that the choice of a proper set of features characterising the protein interface is crucial for the interface prediction task, and that optimality strongly depends on the class of proteins whose interface we want to characterise. We postulate that different protein classes should be treated separately and that it is necessary to identify an optimal set of features for each protein class.Electronic supplementary materialThe online version of this article (10.1186/s12859-018-2043-3) contains supplementary material, which is available to authorized users.
Background Clinical registers constitute an invaluable resource in the medical data-driven decision making context. Accurate machine learning and data mining approaches on these data can lead to faster diagnosis, definition of tailored interventions, and improved outcome prediction. A typical issue when implementing such approaches is the almost unavoidable presence of missing values in the collected data. In this work, we propose an imputation algorithm based on a mutual information-weighted k-nearest neighbours approach, able to handle the simultaneous presence of missing information in different types of variables. We developed and validated the method on a clinical register, constituted by the information collected over subsequent screening visits of a cohort of patients affected by amyotrophic lateral sclerosis. Methods For each subject with missing data to be imputed, we create a feature vector constituted by the information collected over his/her first three months of visits. This vector is used as sample in a k-nearest neighbours procedure, in order to select, among the other patients, the ones with the most similar temporal evolution of the disease over time. An ad hoc similarity metric was implemented for the sample comparison, capable of handling the different nature of the data, the presence of multiple missing values and include the cross-information among features captured by the mutual information statistic. Results We validated the proposed imputation method on an independent test set, comparing its performance with those of three state-of-the-art competitors, resulting in better performance. We further assessed the validity of our algorithm by comparing the performance of a survival classifier built on the data imputed with our method versus the one built on the data imputed with the best-performing competitor. Conclusions Imputation of missing data is a crucial –and often mandatory– step when working with real-world datasets. The algorithm proposed in this work could effectively impute an amyotrophic lateral sclerosis clinical dataset, by handling the temporal and the mixed-type nature of the data and by exploiting the cross-information among features. We also showed how the imputation quality can affect a machine learning task.
Voxel-based representations of surfaces have received a lot of interest in bioinformatics and computational biology as a simple and effective way of representing geometrical and physicochemical properties of proteins and other biomolecules. Processing such surfaces for large molecules can be challenging, as space-demanding data structures with associated high computational costs are required. In this paper, we present a methodology for the fast computation of voxelised macromolecular surface representations (namely the van der Waals, solvent-accessible and solvent-excluded surfaces). The proposed method implements a spatial slicing procedure on top of compact data structures to efficiently calculate the three molecular surface representations at high-resolutions, in parallel. The spatial slicing protocol ensures a balanced workload distribution and allows the computation of the solvent-excluded surface with minimal synchronisation and communication between processes. This is achieved by adapting a multi-step region-growing EDT algorithm. At each step, distance values are first calculated independently for every slice, then, a small portion of the borders' information is exchanged between adjacent slices. Very little process communication is also required in the pocket detection procedure, where the algorithm distinguishes surface portions belonging to solvent-accessible pockets from cavities buried inside the molecule. Experimental results are presented to validate the proposed approach.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.