One of the fundamental goals in proteomics and cell biology is to identify the
functions of proteins in various cellular organelles and pathways. Information of
subcellular locations of proteins can provide useful insights for revealing their
functions and understanding how they interact with each other in cellular network
systems. Most of the existing methods in predicting plant protein subcellular
localization can only cover three or four location sites, and none of them can be
used to deal with multiplex plant proteins that can simultaneously exist at two, or
move between, two or more different location sites. Actually, such multiplex proteins
might have special biological functions worthy of particular notice. The present
study was devoted to improve the existing plant protein subcellular location
predictors from the aforementioned two aspects. A new predictor called
“Plant-mPLoc” is developed by integrating the gene ontology
information, functional domain information, and sequential evolutionary information
through three different modes of pseudo amino acid composition. It can be used to
identify plant proteins among the following 12 location sites: (1) cell membrane, (2)
cell wall, (3) chloroplast, (4) cytoplasm, (5) endoplasmic reticulum, (6)
extracellular, (7) Golgi apparatus, (8) mitochondrion, (9) nucleus, (10) peroxisome,
(11) plastid, and (12) vacuole. Compared with the existing methods for predicting
plant protein subcellular localization, the new predictor is much more powerful and
flexible. Particularly, it also has the capacity to deal with multiple-location
proteins, which is beyond the reach of any existing predictors specialized for
identifying plant protein subcellular localization. As a user-friendly web-server,
Plant-mPLoc is freely accessible at http://www.csbio.sjtu.edu.cn/bioinf/plant-multi/. Moreover, for the
convenience of the vast majority of experimental scientists, a step-by-step guide is
provided on how to use the web-server to get the desired results. It is anticipated
that the Plant-mPLoc predictor as presented in this paper will become a very useful
tool in plant science as well as all the relevant areas.
Current plant genome sequencing projects have called for development of novel and powerful high throughput tools for timely annotating the subcellular location of uncharacterized plant proteins. In view of this, an ensemble classifier, Plant-PLoc, formed by fusing many basic individual classifiers, has been developed for large-scale subcellular location prediction for plant proteins. Each of the basic classifiers was engineered by the K-Nearest Neighbor (KNN) rule. Plant-PLoc discriminates plant proteins among the following 11 subcellular locations: (1) cell wall, (2) chloroplast, (3) cytoplasm, (4) endoplasmic reticulum, (5) extracell, (6) mitochondrion, (7) nucleus, (8) peroxisome, (9) plasma membrane, (10) plastid, and (11) vacuole. As a demonstration, predictions were performed on a stringent benchmark dataset in which none of the proteins included has> or =25% sequence identity to any other in a same subcellular location to avoid the homology bias. The overall success rate thus obtained was 32-51% higher than the rates obtained by the previous methods on the same benchmark dataset. The essence of Plant-PLoc in enhancing the prediction quality and its significance in biological applications are discussed. Plant-PLoc is accessible to public as a free web-server at: (http://202.120.37.186/bioinf/plant). Furthermore, for public convenience, results predicted by Plant-PLoc have been provided in a downloadable file at the same website for all plant protein entries in the Swiss-Prot database that do not have subcellular location annotations, or are annotated as being uncertain. The large-scale results will be updated twice a year to include new entries of plant proteins and reflect the continuous development of Plant-PLoc.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.