Protein Quantification in Label-Free LC-MS Experiments

Clough, Timothy; Key, Melissa; Ott, Ilka; Ragg, Susanne; Schadow, Gunther; Vitek, Olga

doi:10.1021/pr900610q

Cited by 98 publications

(112 citation statements)

References 13 publications

Supporting

Mentioning

111

Contrasting

Order By: Relevance

“…To identify proteins most significantly affected by the stimulus and to address the challenges posed by missing values, we used linear mixed-effects modeling (LiME). LiME, an improvement over ad hoc cutoffs or simple feature averaging, takes advantage of inherent replicate structure of the data and leverages information from a series of biological conditions to identify the significantly affected proteins (26,27). For PRKDC, LiME analysis revealed a systematic increase in peak area for the [s/t]Q containing phosphoPSMs, with >100-fold increase (P < 0.001) between combo and control treatments (Fig.…”

Section: Resultsmentioning

confidence: 99%

Phosphoproteomic characterization of DNA damage response in melanoma cells following MEK/PI3K dual inhibition

Kirkpatrick¹,

Bustos²,

Dogan

et al. 2013

Proc. Natl. Acad. Sci. U.S.A.

View full text Add to dashboard Cite

show abstract

Section: Resultsmentioning

confidence: 99%

Phosphoproteomic characterization of DNA damage response in melanoma cells following MEK/PI3K dual inhibition

Kirkpatrick¹,

Bustos²,

Dogan

et al. 2013

Proc. Natl. Acad. Sci. U.S.A.

View full text Add to dashboard Cite

show abstract

“…Benchmark Peptide-based Model-We start from the peptidebased linear regression models as proposed by Daly et al (39) Clough et al (22) and Karpievitch et al (40), of which we have independently proven their superior performance compared to summarizationbased workflows (21). In general, the following model is proposed: (1) ridge regression, which leads to shrunken yet more stable log 2 fold change (FC) estimates, (2) Empirical Bayes estimation of the variance, which further stabilizes variance estimators, and (3) M-estimation with Huber weights, which reduces the impact of outlying peptide intensities.…”

Section: Methodsmentioning

confidence: 99%

“…Peptide-based linear regression models estimate protein fold changes directly from peptide intensities and outperform summarization-based methods by reducing bias and generating more correct precision estimates (21,22). However, peptide-based linear regression models suffer from overfitting due to extreme observations and the unbalanced nature of proteomics data; i.e.…”

mentioning

confidence: 99%

Peptide-level Robust Ridge Regression Improves Estimation, Sensitivity, and Specificity in Data-dependent Quantitative Label-free Shotgun Proteomics

Goeminne

Gevaert

Clement

2016

Molecular & Cellular Proteomics

121

View full text Add to dashboard Cite

Peptide intensities from mass spectra are increasingly used for relative quantitation of proteins in complex samples. However, numerous issues inherent to the mass spectrometry workflow turn quantitative proteomic data analysis into a crucial challenge. We and others have shown that modeling at the peptide level outperforms classical summarization-based approaches, which typically also discard a lot of proteins at the data preprocessing step. Peptide-based linear regression models, however, still suffer from unbalanced datasets due to missing peptide intensities, outlying peptide intensities and overfitting. Here, we further improve upon peptide-based models by three modular extensions: ridge regression, improved variance estimation by borrowing information across proteins with empirical Bayes and M-estimation with Huber weights. We illustrate our method on the CPTAC spike-in study and on a study comparing wild-type and ArgP knock-out Francisella tularensis proteomes. We show that the fold change estimates of our robust approach are more precise and more accurate than those from stateof-the-art summarization-based methods and peptidebased regression models, which leads to an improved sensitivity and specificity. We also demonstrate that ionization competition effects come already into play at very low spike-in concentrations and confirm that analyses with peptide-based regression methods on peptide intensity values aggregated by charge state and modification status (e.g. High-throughput LC-MS-based proteomic workflows are widely used to quantify differential protein abundance between samples. Relative protein quantification can be achieved by stable isotope labeling workflows such as metabolic (1, 2) and postmetabolic labeling (3-6). These types of experiments generally avoid run-to-run differences in the measured peptide (and thus protein) content by pooling and analyzing differentially labeled samples in a single run. Labelfree quantitative (LFQ) 1 workflows become increasingly popular as the often expensive and time-consuming labeling protocols are omitted. Moreover, LFQ proteomics allows for more flexibility in comparing samples and tends to cover a larger area of the proteome at a higher dynamic range (7,8). Nevertheless, the nature of the LFQ protocol makes shotgun proteomic data analysis a challenging task. Missing values are omnipresent in proteomic data generated by data-dependent acquisition workflows, for instance because of low-abundant peptides that are not always fragmented in complex peptide mixtures and a limited number of modifications and mutations that can be accounted for in the feature search. Moreover, the overall abundance of a peptide is determined by the surroundings of its corresponding cleavage sites as these influence protease cleavage efficiency (9). Similarly, some peptides are more easily ionized than others (10). These issues not only lead to missing peptides, but also increase variability in individual peptide intensities. The discrete nature of MS1 sampling following continuous ...

show abstract

“…Hrydziuszko and Viant, 2011;Wang et al, 2012;WebbRobertson et al, 2015) which presents a significant challenge for statistical analysis (see e.g. Clough et al, 2009). Analysis of such datasets can follow one of two approaches of either eliminating missing values prior to analysis or using methods that integrate missing values in the testing procedure.…”

Section: Introductionmentioning

confidence: 99%

Multivariate two-part statistics for analysis of correlated mass spectrometry data from multiple biological specimens

et al. 2016

View full text Add to dashboard Cite

Motivation: High through-put mass spectrometry (MS) is now being used to profile small molecular compounds across multiple biological sample types from the same subjects with the goal of leveraging information across biospecimens. Multivariate statistical methods that combine information from all biospecimens could be more powerful than the usual univariate analyses. However, missing values are common in MS data and imputation can impact betweenbiospecimen correlation and multivariate analysis results. Results: We propose two multivariate two-part statistics that accommodate missing values and combine data from all biospecimens to identify differentially regulated compounds. Statistical significance is determined using a multivariate permutation null distribution. Relative to univariate tests, the multivariate procedures detected more significant compounds in three biological datasets. In a simulation study, we showed that multi-biospecimen testing procedures were more powerful than single-biospecimen methods when compounds are differentially regulated in multiple biospecimens but univariate methods can be more powerful if compounds are differentially regulated in only one biospecimen. Availability and Implementation: We provide R functions to implement and illustrate our method as supplementary information.

show abstract

Protein Quantification in Label-Free LC-MS Experiments

Cited by 98 publications

References 13 publications

Phosphoproteomic characterization of DNA damage response in melanoma cells following MEK/PI3K dual inhibition

Phosphoproteomic characterization of DNA damage response in melanoma cells following MEK/PI3K dual inhibition

Peptide-level Robust Ridge Regression Improves Estimation, Sensitivity, and Specificity in Data-dependent Quantitative Label-free Shotgun Proteomics

Multivariate two-part statistics for analysis of correlated mass spectrometry data from multiple biological specimens

Contact Info

Product

Resources

About