BackgroundDocking and scoring large libraries of ligands against target proteins forms the basis of structure-based virtual screening. The problem is trivially parallelizable, and calculations are generally carried out on computer clusters or on large workstations in a brute force manner, by docking and scoring all available ligands.ContributionIn this study we propose a strategy that is based on iteratively docking a set of ligands to form a training set, training a ligand-based model on this set, and predicting the remainder of the ligands to exclude those predicted as ‘low-scoring’ ligands. Then, another set of ligands are docked, the model is retrained and the process is repeated until a certain model efficiency level is reached. Thereafter, the remaining ligands are docked or excluded based on this model. We use SVM and conformal prediction to deliver valid prediction intervals for ranking the predicted ligands, and Apache Spark to parallelize both the docking and the modeling.ResultsWe show on 4 different targets that conformal prediction based virtual screening (CPVS) is able to reduce the number of docked molecules by 62.61% while retaining an accuracy for the top 30 hits of 94% on average and a speedup of 3.7. The implementation is available as open source via GitHub (https://github.com/laeeq80/spark-cpvs) and can be run on high-performance computers as well as on cloud resources.
Background Structure-based virtual screening is an in-silico method to screen a target receptor against a virtual molecular library. Applying docking-based screening to large molecular libraries can be computationally expensive, however it constitutes a trivially parallelizable task. Most of the available parallel implementations are based on message passing interface, relying on low failure rate hardware and fast network connection. Google’s MapReduce revolutionized large-scale analysis, enabling the processing of massive datasets on commodity hardware and cloud resources, providing transparent scalability and fault tolerance at the software level. Open source implementations of MapReduce include Apache Hadoop and the more recent Apache Spark.ResultsWe developed a method to run existing docking-based screening software on distributed cloud resources, utilizing the MapReduce approach. We benchmarked our method, which is implemented in Apache Spark, docking a publicly available target receptor against 2.2 M compounds. The performance experiments show a good parallel efficiency (87%) when running in a public cloud environment.ConclusionOur method enables parallel Structure-based virtual screening on public cloud resources or commodity computer clusters. The degree of scalability that we achieve allows for trying out our method on relatively small libraries first and then to scale to larger libraries. Our implementation is named Spark-VS and it is freely available as open source from GitHub (https://github.com/mcapuccini/spark-vs).Graphical abstract.
This paper presents a novel approach towards iris recognition based on dual boundary (Pupil-Iris & Sclera-Iris) detection and then using a modified Multilayer Feed Forward neural network (MFNN) to perform an efficient automatic classification. The novelty of the work resides in the fact that the proposed method features the localization of the dual iris boundaries to be used as feature vector for classification. The process of information extraction starts by preprocessing the eyeimage to remove specular highlight and then locating the pupil of the eye by using edge detection. The centroid of the detected pupil is chosen as the reference point for extracting the boundary points. The boundary points are recorded using radius vector functions approach. The proposed feature vector is obtained by concatenating the contour points of the Pupil-Iris boundary and the Sclera-Iris boundary which will yield a unique pattern named as Iris signature. The proposed method is translational and scale invariant. The classification is performed using the MFNN via a modified version of back-propagation algorithm which uses a time varying learning rate. The proposed system has been tested on moderate no of pictures taken from MMU iris database in the presence of additive noise for different values of signal-to-noise ratio (SNR). Experimental result for percentage recognition shows that the proposed method outperforms the single boundary method.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.