Top-down proteomics is the analysis of proteins in their
intact
form without proteolysis, thus preserving valuable information about
post-translational modifications, isoforms, and proteolytic processing.
However, it is still a developing field due to limitations in the
instrumentation, difficulties with the interpretation of complex mass
spectra, and a lack of well-established quantification approaches.
TopPIC is one of the popular tools for proteoform identification.
We extended its capabilities into label-free proteoform quantification
by developing a companion R package (TopPICR). Key steps in the TopPICR
pipeline include filtering identifications, inferring a minimal set
of protein accessions explaining the observed sequences, aligning
retention times, recalibrating measured masses, clustering features
across data sets, and finally compiling feature intensities using
the match-between-runs approach. The output of the pipeline is an
MSnSet object which makes downstream data analysis seamlessly compatible
with packages from the Bioconductor project. It also provides the
capability for visualizing proteoforms within the context of the parent
protein sequence. The functionality of TopPICR is demonstrated on
top-down LC-MS/MS data sets of 10 human-in-mouse xenografts of luminal
and basal breast tumor samples.