One of the fundamental goals in cellular biochemistry is to identify the functions of proteins in the context of compartments that organize them in the cellular environment. To realize this, it is indispensable to develop an automated method for fast and accurate identification of the subcellular locations of uncharacterized proteins. The current study is focused on plant protein subcellular location prediction based on the sequence information alone. Although considerable efforts have been made in this regard, the problem is far from being solved yet. Most of the existing methods can be used to deal with single-location proteins only. Actually, proteins with multi-locations may have some special biological functions. This kind of multiplex protein is particularly important for both basic research and drug design. Using the multi-label theory, we present a new predictor called "pLoc-mPlant" by extracting the optimal GO (Gene Ontology) information into the Chou's general PseAAC (Pseudo Amino Acid Composition). Rigorous cross-validation on the same stringent benchmark dataset indicated that the proposed pLoc-mPlant predictor is remarkably superior to iLoc-Plant, the state-of-the-art method for predicting plant protein subcellular localization. To maximize the convenience of most experimental scientists, a user-friendly web-server for the new predictor has been established at , by which users can easily get their desired results without the need to go through the complicated mathematics involved.
Information about the interactions of drug compounds with proteins in cellular networking is very important for drug development. Unfortunately, all the existing predictors for identifying drug-protein interactions were trained by a skewed benchmark data-set where the number of non-interactive drug-protein pairs is overwhelmingly larger than that of the interactive ones. Using this kind of highly unbalanced benchmark data-set to train predictors would lead to the outcome that many interactive drug-protein pairs might be mispredicted as non-interactive. Since the minority interactive pairs often contain the most important information for drug design, it is necessary to minimize this kind of misprediction. In this study, we adopted the neighborhood cleaning rule and synthetic minority over-sampling technique to treat the skewed benchmark datasets and balance the positive and negative subsets. The new benchmark datasets thus obtained are called the optimized benchmark datasets, based on which a new predictor called iDrug-Target was developed that contains four sub-predictors: iDrug-GPCR, iDrug-Chl, iDrug-Ezy, and iDrug-NR, specialized for identifying the interactions of drug compounds with GPCRs (G-protein-coupled receptors), ion channels, enzymes, and NR (nuclear receptors), respectively. Rigorous cross-validations on a set of experiment-confirmed datasets have indicated that these new predictors remarkably outperformed the existing ones for the same purpose. To maximize users' convenience, a public accessible Web server for iDrug-Target has been established at http://www.jci-bioinfo.cn/iDrug-Target/ , by which users can easily get their desired results. It has not escaped our notice that the aforementioned strategy can be widely used in many other areas as well.
Many efforts have been made in predicting the subcellular localization of eukaryotic proteins, but most of the existing methods have the following two limitations: (1) their coverage scope is less than ten locations and hence many organelles in an eukaryotic cell cannot be covered, and (2) they can only be used to deal with single-label systems in which each of the constituent proteins has one and only one location. Actually, proteins with multiple locations are particularly interesting since they may have some exceptional functions very important for in-depth understanding the biological process in a cell and for selecting drug target as well. Although several predictors (such as "Euk-mPLoc", "Euk-PLoc 2.0" and "iLoc-Euk") can cover up to 22 different location sites, and they also have the function to treat multi-labeled proteins, further efforts are needed to improve their prediction quality, particularly in enhancing the absolute true rate and in reducing the absolute false rate. Here we propose a new predictor called "pLoc-mEuk" by extracting the key GO (Gene Ontology) information into the general PseAAC (Pseudo Amino Acid Composition). Rigorous cross-validations on a high-quality and stringent benchmark dataset have indicated that the proposed pLoc-mEuk predictor is remarkably superior to iLoc-Euk, the best of the aforementioned three predictors. To maximize the convenience of most experimental scientists, a user-friendly web-server for the new predictor has been established at http://www.jci-bioinfo.cn/pLoc-mEuk/, by which users can easily get their desired results without the need to go through the complicated mathematics involved.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.