HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L'archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d'enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Motivation: In any macromolecular polyprotic system—for example protein, DNA or RNA—the isoelectric point—commonly referred to as the pI—can be defined as the point of singularity in a titration curve, corresponding to the solution pH value at which the net overall surface charge—and thus the electrophoretic mobility—of the ampholyte sums to zero. Different modern analytical biochemistry and proteomics methods depend on the isoelectric point as a principal feature for protein and peptide characterization. Protein separation by isoelectric point is a critical part of 2-D gel electrophoresis, a key precursor of proteomics, where discrete spots can be digested in-gel, and proteins subsequently identified by analytical mass spectrometry. Peptide fractionation according to their pI is also widely used in current proteomics sample preparation procedures previous to the LC-MS/MS analysis. Therefore accurate theoretical prediction of pI would expedite such analysis. While such pI calculation is widely used, it remains largely untested, motivating our efforts to benchmark pI prediction methods.
Results: Using data from the database PIP-DB and one publically available dataset as our reference gold standard, we have undertaken the benchmarking of pI calculation methods. We find that methods vary in their accuracy and are highly sensitive to the choice of basis set. The machine-learning algorithms, especially the SVM-based algorithm, showed a superior performance when studying peptide mixtures. In general, learning-based pI prediction methods (such as Cofactor, SVM and Branca) require a large training dataset and their resulting performance will strongly depend of the quality of that data. In contrast with Iterative methods, machine-learning algorithms have the advantage of being able to add new features to improve the accuracy of prediction.
Contact: yperez@ebi.ac.uk
Availability and Implementation: The software and data are freely available at https://github.com/ypriverol/pIR.
Supplementary information:
Supplementary data are available at Bioinformatics online.
Mass spectrometry is now firmly established as a powerful technique for the identification and characterization of proteins when used in conjunction with sequence databases. Various approaches involving stable-isotope labeling have been developed for quantitative comparisons between paired samples in proteomic expression analysis by mass spectrometry. However, interpretation of such mass spectra is far from being fully automated, mainly due to the difficulty of analyzing complex patterns resulting from the overlap of multiple peaks arising from the assortment of natural isotopes. In order to facilitate the interpretation of a complex mass spectrum of such a mixture, such as an MS spectrum of a stable-isotope-enriched ion species, we report on the development of a software application, 'Matching' (web accessible), that enables the automatic matching of theoretical isotope envelopes to multiple ion peaks in a raw spectrum. It is particularly useful for resolving the relative abundances of narrow-split paired peaks caused by enrichment with a stable isotope, such as 18O, 13C, 2H, or 15N.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.