In this paper, we compare the performance of six different feature selection methods for LC-MS-based proteomics and metabolomics biomarker discovery-t test, the Mann-Whitney-Wilcoxon test (mww test), nearest shrunken centroid (NSC), linear support vector machinerecursive features elimination (SVM-RFE), principal component discriminant analysis (PCDA), and partial least squares discriminant analysis (PLSDA)-using human urine and porcine cerebrospinal fluid samples that were spiked with a range of peptides at different concentration levels. The ideal feature selection method should select the complete list of discriminating features that are related to the spiked peptides without selecting unrelated features. Whereas many studies have to rely on classification error to judge the reliability of the selected biomarker candidates, we assessed the accuracy of selection directly from the list of spiked peptides. The feature selection methods were applied to data sets with different sample sizes and extents of sample class separation determined by the concentration level of spiked compounds. For each feature selection method and data set, the performance for selecting a set of features related to spiked compounds was assessed using the harmonic mean of the recall and the precision (f-score) and the geometric mean of the recall and the true negative rate (g-score). We conclude that the univariate t test and the mww test with multiple testing corrections are not applicable to data sets with small sample sizes (n ؍ 6), but their performance improves markedly with increasing sample size up to a point (n > 12) at which they outperform the other methods. PCDA and PLSDA select small feature sets with high precision but miss many true positive features related to the spiked peptides. NSC strikes a reasonable compromise between recall and precision for all data sets independent of spiking level and number of samples. Linear SVM-RFE performs poorly for selecting features related to the spiked compounds, even though the classification error is relatively low. Molecular & Cellular
Multidimensional chromatography coupled to mass spectrometry (LC n -MS) provides more separation power and an extended measured dynamic concentration range to analyse complex proteomics samples than one dimensional liquid chromatography coupled to mass spectrometry (1D-LC-MS). This review gives an overview of the most important aspects of LC n -MS with respect to optimizing peak capacity and evaluate orthogonality. We review recent developments in LC n -MS to analyse proteomics samples from the analyst point of view and give an overview over methods and future developments to process LC n -MS data for comprehensive differential protein expression profiling. Examples from our research, such as combining protein fractionation using high temperature reverse phase (RP) columns followed by analysis of the trypsin-digested fractions by RP LC-MS, serve to highlight possibilities and shortcomings of present-day approaches. Other LC n -MS systems that have been used to analyse highly complex shotgun proteomic samples, such as the combination of RP columns using low and high pH eluents or the combination of hydrophilic interaction liquid chromatography (HILIC) with RP-MS is discussed in detail. Abbreviations:1D-LC-MS, one dimensional liquid chromatography coupled to mass spectrometry; 2D, two dimensional; 2DGE, two dimensional gel electrophoresis; 2DGE-RP-MS, two dimensional separation system using one dimensional gel electrophoresis in the first and reverse phase chromatography in the second dimension; 2D-LC-MS, two dimensional liquid chromatography coupled to mass spectrometry; AF, affinity chromatography; AQUA, absolute quantification based on stable isotope labeled peptides; COFRADIC, combined fractional diagonal chromatography; DAD, diode array detector; ECD, electron capture detector; FID, flame ionization detector; GCÂ GC, two dimensional gas chromatography; HILIC, hydrophilic interaction liquid chromatography; HILIC-RP-MS, two dimensional liquid chromatography coupled to mass spectrometry using hydrophilic interaction liquid chromatography separation for the first and reverse phase liquid chromatography for the second dimension; ICAT, isotope coded affinity tag; iTRAQ, isotope tags for relative and absolute quantitation; LC, liquid chromatography; L C n , multidimensional liquid chromatography; LC-MS, liquid chromatography coupled to mass spectrometry; LC-MS/MS, liquid chromatography coupled to tandem mass spectrometry; LC n -MS, multidimensional liquid chromatography coupled to mass spectrometry; m/z, mass to charge ratio; MALDI, matrix assisted laser desorption ionisation; MRM, multiple reaction monitoring; MS/MS, tandem mass spectrometry; MudPIT, multidimensional protein identification technology; nanoLC, liquid chromatography using columns smaller than 300 mm; PSAQ, protein standard absolute quantification with stable isotope labeled recombinant proteins; QconCAT, quantification concatamer using artificial concatamer of proteotypic stable isotope labeled peptides; RP, reverse phase chromatography; RP-MS, one d...
Data processing forms an integral part of biomarker discovery and contributes significantly to the ultimate result. To compare and evaluate various publicly available open source label-free data processing workflows, we developed msCompare, a modular framework that allows the arbitrary combination of different feature detection/quantification and alignment/matching algorithms in conjunction with a novel scoring method to evaluate their overall performance. We used msCompare to assess the performance of workflows built from modules of publicly available data processing packages such as SuperHirn, OpenMS, and MZmine and our in-house developed modules on peptide-spiked urine and trypsin-digested cerebrospinal fluid (CSF) samples. We found that the quality of results varied greatly among workflows, and interestingly, heterogeneous combinations of algorithms often performed better than the homogenous workflows. Our scoring method showed that the union of feature matrices of different workflows outperformed the original homogenous workflows in some cases. msCompare is open source software (https://trac.nbic.nl/mscompare), and we provide a web-based data processing service for our framework by integration into the Galaxy server of the Netherlands Bioinformatics Center (http://galaxy.nbic.nl/ galaxy) to allow scientists to determine which combination of modules provides the most accurate processing for their particular LC-MS data sets. Molecular & Cellular Proteomics 11: 10.1074/mcp.M111.015974, 1-13, 2012.LC-MS is a well established analysis technique in the field of proteomics and metabolomics (1-5). It is frequently used for comparative label-free profiling of preclassified sets of samples with the aim to identify a set of discriminating compounds, which are either further used to select biomarker candidates or to identify pathways involved in the studied biological processes (6 -8). However, the highly complex and large data sets necessitate the use of elaborated data processing workflows to reliably identify discriminatory compounds (9 -11).The main aim in the quantitative processing of label-free LC-MS data is to obtain accurate quantitative information about the measured compounds, as well as proper matching of the same compounds across multiple samples. Quantification of compounds from raw mass spectrometry data can be performed in a number of ways. Spectral counting methods (11-14) are mainly used for proteomics samples and exploit the number of MS/MS spectra that are acquired per peptide ion(s) for protein quantification. These methods are easy to implement because they use the output of the peptide/protein identification tools but are less accurate than methods based on ion intensity for the determination of protein ratios (15, 16). Other widely used methods rely on single-stage MS information for compound quantification. In single-stage MS data, compounds (peptides, proteins, and metabolites) are detected and quantified in the raw mass spectrometry data, but they are not identified. Instead, algorithms locat...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.