Multiple testing corrections are a useful tool for restricting the FDR, but can be blunt in the context of low power, as we demonstrate by a series of simple simulations. Unfortunately, in proteomics experiments low power can be common, driven by proteomics-specific issues like small effects due to ratio compression, and few replicates due to reagent high cost, instrument time availability and other issues; in such situations, most multiple testing corrections methods, if used with conventional thresholds, will fail to detect any true positives even when many exist. In this low power, medium scale situation, other methods such as effect size considerations or peptide-level calculations may be a more effective option, even if they do not offer the same theoretical guarantee of a low FDR. Thus, we aim to highlight in this article that proteomics presents some specific challenges to the standard multiple testing corrections methods, which should be employed as a useful tool but not be regarded as a required rubber stamp. Keywords: FDR / Multiple testing corrections / Shot gun proteomics ViewpointMultiple testing corrections come in many "flavors." They are employed to limit the number of false positives occurring by chance when an analysis is repeated many times and thus to reduce the FDR at the analysis level. In proteomics they were initially borrowed from microarray research and other high throughput areas, where they quickly became the norm. The informative review by Diz [1] shows that multiple testing corrections were seldom used in quantitative proteomics until relatively recently, and recommends a sensible multimethod approach. Here, we suggest that one reason for the slower uptake of such methods in proteomics experiments, despite their ease of use and theoretical appeal, is that they remain a useful but blunt tool that is less effective in discovery proteomics than in, for example, the microarray environment. In this paper, we describe five key factors that combine to make multiple testing corrections less effective in proteomics: medium problem scale; lower effect size due to possible compression; lower analysis power due to high cost; percentage of data showing an effect; and data distribution quirks. We then discuss some simple alternatives that can help reduce, or at least understand the FDR in this medium scale, low power situation.
The use of data-independent acquisition methods such as SWATH for mass spectrometry based proteomics is usually performed with peptide MS/MS assay libraries which enable identification and quantitation of peptide peak areas. Reference assay libraries can be generated locally through information dependent acquisition, or obtained from community data repositories for commonly studied organisms. However, there have been no studies performed to systematically evaluate how locally generated or repository-based assay libraries affect SWATH performance for proteomic studies. To undertake this analysis, we developed a software workflow, SwathXtend, which generates extended peptide assay libraries by integration with a local seed library and delivers statistical analysis of SWATH-quantitative comparisons. We designed test samples using peptides from a yeast extract spiked into peptides from human K562 cell lysates at three different ratios to simulate protein abundance change comparisons. SWATH-MS performance was assessed using local and external assay libraries of varying complexities and proteome compositions. These experiments demonstrated that local seed libraries integrated with external assay libraries achieve better performance than local assay libraries alone, in terms of the number of identified peptides and proteins and the specificity to detect differentially abundant proteins. Our findings show that the performance of extended assay libraries is influenced by the MS/MS feature similarity of the seed and external libraries, while statistical analysis using multiple testing corrections increases the statistical rigor needed when searching against large extended assay libraries. Data Independent Acquisition (DIA) 1 mass spectrometry workflows are gaining increasing use for proteomic analysis of model systems (1-8). The first integrated DIA and quantitative analysis protocol, termed SWATH (2) was shown to offer accurate, reproducible, and robust proteomic quantification (9 -14). DIA offers advantages over conventional IDA methods (15) by overcoming the stochastic, intensity-based selection of peptide precursors-a problem which typically leads to inconsistent peptide detection and quantitation between replicate runs. By overcoming this problem, DIA is highly suited for large-scale comparative analyses as gaps in data points between samples are mostly eliminated. These digital, extensive proteome maps can be repeatedly mined for quantitative data by extracting ion chromatograms of defined peptides postacquisition, and yields fewer quantitative missing (NA) values than IDA. An important concept in DIA analysis is use of a LC-retention time referenced spectral ion assay library to enable peptide identification from DIA generated multiplexed MS/MS spectra (10,13,16). The depth and quality of this spectral reference library directly correlates with experimental outcome, therefore we consider it is essential to explore and understand this variable in detail.The reference assay library should contain all the prior knowle...
Background: One of the most significant challenges in colorectal cancer (CRC) management is the use of compliant early stage population-based diagnostic tests as adjuncts to confirmatory colonoscopy. Despite the near curative nature of early clinical stage surgical resection, mortality remains unacceptably high-as the majority of patients diagnosed by faecal haemoglobin followed by colonoscopy occur at latter stages. Additionally, current populationbased screens reliant on fecal occult blood test (FOBT) have low compliance (~ 40%) and tests suffer low sensitivities. Therefore, blood-based diagnostic tests offer survival benefits from their higher compliance (≥ 97%), if they can at least match the sensitivity and specificity of FOBTs. However, discovery of low abundance plasma biomarkers is difficult due to occupancy of a high percentage of proteomic discovery space by many high abundance plasma proteins (e.g., human serum albumin). Methods: A combination of high abundance protein ultradepletion (e.g., MARS-14 and an in-house IgY depletion columns) strategies, extensive peptide fractionation methods (SCX, SAX, High pH and SEC) and SWATH-MS were utilized to uncover protein biomarkers from a cohort of 100 plasma samples (i.e., pools of 20 healthy and 20 stages I-IV CRC plasmas). The differentially expressed proteins were analyzed using ANOVA and pairwise t-tests (p < 0.05; fold-change > 1.5), and further examined with a neural network classification method using in silico augmented 5000 patient datasets. Results: Ultradepletion combined with peptide fractionation allowed for the identification of a total of 513 plasma proteins, 8 of which had not been previously reported in human plasma (based on PeptideAtlas database). SWATH-MS analysis revealed 37 protein biomarker candidates that exhibited differential expression across CRC stages compared to healthy controls. Of those, 7 candidates (CST3, GPX3, CFD, MRC1, COMP, PON1 and ADAMDEC1) were validated using Western blotting and/or ELISA. The neural network classification narrowed down candidate biomarkers to 5 proteins (SAA2, APCS, APOA4, F2 and AMBP) that had maintained accuracy which could discern early (I/II) from late (III/IV) stage CRC. Conclusion: MS-based proteomics in combination with ultradepletion strategies have an immense potential of identifying diagnostic protein biosignature.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.