Glycosylation is one of the most abundant post-translational modifications (PTMs) required for various structure/function modulations of proteins in a living cell. Although elucidated recently in prokaryotes, this type of PTM is present across all three domains of life. In prokaryotes, two types of protein glycan linkages are more widespread namely, N- linked, where a glycan moiety is attached to the amide group of Asn, and O- linked, where a glycan moiety is attached to the hydroxyl group of Ser/Thr/Tyr. For their biologically ubiquitous nature, significance, and technology applications, the study of prokaryotic glycoproteins is a fast emerging area of research. Here we describe new Support Vector Machine (SVM) based algorithms (models) developed for predicting glycosylated-residues (glycosites) with high accuracy in prokaryotic protein sequences. The models are based on binary profile of patterns, composition profile of patterns, and position-specific scoring matrix profile of patterns as training features. The study employ an extensive dataset of 107 N-linked and 116 O-linked glycosites extracted from 59 experimentally characterized glycoproteins of prokaryotes. This dataset includes validated N-glycosites from phyla Crenarchaeota, Euryarchaeota (domain Archaea), Proteobacteria (domain Bacteria) and validated O-glycosites from phyla Actinobacteria, Bacteroidetes, Firmicutes and Proteobacteria (domain Bacteria). In view of the current understanding that glycosylation occurs on folded proteins in bacteria, hybrid models have been developed using information on predicted secondary structures and accessible surface area in various combinations with training features. Using these models, N-glycosites and O-glycosites could be predicted with an accuracy of 82.71% (MCC 0.65) and 73.71% (MCC 0.48), respectively. An evaluation of the best performing models with 28 independent prokaryotic glycoproteins confirms the suitability of these models in predicting N- and O-glycosites in potential glycoproteins from aforementioned organisms, with reasonably high confidence. A web server GlycoPP, implementing these models is available freely at http:/www.imtech.res.in/raghava/glycopp/.
Nα-acetylation is a naturally occurring irreversible modification of N-termini of proteins catalyzed by Nα-acetyltransferases (NATs). Although present in all three domains of life, it is little understood in bacteria. The functional grouping of NATs into six types NatA - NatF, in eukaryotes is based on subunit requirements and stringent substrate specificities. Bacterial orthologs are phylogenetically divergent from eukaryotic NATs, and only a couple of them are characterized biochemically. Accordingly, not much is known about their substrate specificities. Rv3420c of Mycobacterium tuberculosis is a NAT ortholog coding for RimIMtb. Using in vitro peptide-based enzyme assays and mass-spectrometry methods, we provide evidence that RimIMtb is a protein Nα-acetyltransferase of relaxed substrate specificity mimicking substrate specificities of eukaryotic NatA, NatC and most competently that of NatE. Also, hitherto unknown acetylation of residues namely, Asp, Glu, Tyr and Leu by a bacterial NAT (RimIMtb) is elucidated, in vitro. Based on in vivo acetylation status, in vitro assay results and genetic context, a plausible cellular substrate for RimIMtb is proposed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.