A Bioinformatics Workflow for Variant Peptide Detection in Shotgun Proteomics

Li, Jing; Su, Zengliu; Ma, Zeqiang; Slebos, Robbert J.C.; Halvey, Patrick J.; Tabb, David L.; Liebler, Daniel C.; Pao, William; Zhang, Bing

doi:10.1074/mcp.m110.006536

Cited by 90 publications

(136 citation statements)

References 47 publications

Supporting

Mentioning

133

Contrasting

Unclassified

Order By: Relevance

“…11 Subsequently, this group developed an integrated bioinformatics workflow to detect variant peptides, and 204 variant peptides, including 5 peptides known for cancer-related mutations, were identified from three colorectal tumor specimens. 12 A more comprehensive variant-associated database was constructed by Song et al, where the humsavar database and CanProVar database were integrated into the UniProtKB/Swiss-Prot canonical protein database and 282 unique SAAVs sites were quantified in human liver tissues. 13 In other work, 128 SAAVs paired with related canonical peptides using a customized database from neXtprot were identified.…”

Section: Introductionmentioning

confidence: 99%

Single Amino Acid Variant Profiles of Subpopulations in the MCF-7 Breast Cancer Cell Line

et al. 2017

View full text Add to dashboard Cite

Cancers are initiated and developed from a small population of stem-like cells termed cancer stem cells (CSCs). There is heterogeneity among this CSC population that leads to multiple subpopulations with their own distinct biological features and protein expression. The protein expression and function may be impacted by amino acid variants that can occur largely due to single nucleotide changes. We have thus performed proteomic analysis of breast CSC subpopulations by mass spectrometry to study the presence of single amino acid variants (SAAVs) and their relation to breast cancer. We have used CSC markers to isolate pure breast CSC subpopulation fractions (ALDH+ and CD44+/CD24− cell populations) and the mature luminal cells (CD49f−EpCAM+) from the MCF-7 breast cancer cell line. By searching the Swiss-CanSAAVs database, 374 unique SAAVs were identified in total, where 27 are cancer-related SAAVs. 135 unique SAAVs were found in the CSC population compared with the mature luminal cells. The distribution of SAAVs detected in MCF-7 cells was compared with those predicted from the Swiss-CanSAAVs database, where we found distinct differences in the numbers of SAAVs detected relative to that expected from the Swiss-CanSAAVs database for several of the amino acids.

show abstract

Section: Introductionmentioning

confidence: 99%

Single Amino Acid Variant Profiles of Subpopulations in the MCF-7 Breast Cancer Cell Line

et al. 2017

View full text Add to dashboard Cite

show abstract

“…Therefore, it is essential that data acquired through MPS are used to create tumor-specific databases, incorporating the possibility of variant proteins arising through somatic mutation, inherited polymorphisms, alternatively spliced isoforms, and novel expression. The goal of this study was to analyze the flow of information though the central dogma of biology in an unbiased and comprehensive way to profoundly understand the aberrant information flux that underlies all cancer biology (2)(3)(4)(5).…”

mentioning

confidence: 99%

An Analysis of the Sensitivity of Proteogenomic Mapping of Somatic Mutations and Novel Splicing Events in Cancer

Ruggles

Tang

Wang

et al. 2016

Molecular & Cellular Proteomics

Self Cite

112

127

View full text Add to dashboard Cite

Improvements in mass spectrometry (MS)-based peptide sequencing provide a new opportunity to determine whether polymorphisms, mutations, and splice variants identified in cancer cells are translated. Herein, we apply a proteogenomic data integration tool (QUILTS) to illustrate protein variant discovery using whole genome, whole transcriptome, and global proteome datasets generated from a pair of luminal and basal-like breast-cancer-patient-derived xenografts (PDX). The sensitivity of proteogenomic analysis for singe nucleotide variant (SNV) expression and novel splice junction (NSJ) detection was probed using multiple MS/MS sample process replicates defined here as an independent tandem MS experiment using identical sample material. Despite analysis of over 30 sample process replicates, only about 10% of SNVs (somatic and germline) detected by both DNA and RNA sequencing were observed as peptides. An even smaller proportion of peptides corresponding to NSJ observed by RNA sequencing were detected (<0.1%). Peptides mapping to DNA-detected SNVs without a detectable mRNA transcript were also observed, suggesting that transcriptome coverage was incomplete (ϳ80%). In contrast to germline variants, somatic variants were less likely to be detected at the peptide level in the basal-like tumor than in the luminal tumor, raising the possibility of differential translation or protein degradation effects. In conclusion, this large-scale proteogenomic integration allowed us to determine the degree to which mutations are translated and identify gaps in sequence coverage, thereby benchmarking current technology and progress toward whole cancer proteome and transcriptome analysis. Massively parallel sequencing (MPS)1 of cancer genomes has demonstrated enormous complexity, and it is often unclear which somatic mutations drive tumor biology and which are nonfunctional passenger mutations that passively accumulate. RNA sequencing is frequently used to determine which nucleotide variants are transcribed and therefore have the potential for biological function. However, many mutations detected at the DNA level are not observed at the mRNA level, and their observation is dependent upon expression of the stability of the mRNA (1). Mutation detection at the peptide level clearly increases the confidence that any given variant is

show abstract

“…The generation of well curated specific database derived from RNA-Seq from each specific sample [9,10], and searches it against proteomic data from the same sample. As a result the targeted database search will enhance the chances of right protein variant identifications with high confidence.…”

Section: Introductionmentioning

confidence: 99%