The organization of eukaryotic cells into distinct subcompartments is vital for all functional processes, and aberrant protein localization is a hallmark of many diseases. Microscopy methods, although powerful, are usually low-throughput and dependent on the availability of fluorescent fusion proteins or highly specific and sensitive antibodies. One method that provides a global picture of the cell is localization of organelle proteins by isotope tagging (LOPIT), which combines biochemical cell fractionation using density gradient ultracentrifugation with multiplexed quantitative proteomics mass spectrometry, allowing simultaneous determination of the steady-state distribution of hundreds of proteins within organelles. Proteins are assigned to organelles based on the similarity of their gradient distribution to those of well-annotated organelle marker proteins. We have substantially re-developed our original LOPIT protocol (published by Nature Protocols in 2006) to enable the subcellular localization of thousands of proteins per experiment (hyperLOPIT), including spatial resolution at the suborganelle and large protein complex level. This Protocol Extension article integrates all elements of the hyperLOPIT pipeline, including an additional enrichment strategy for chromatin, extended multiplexing capacity of isobaric mass tags, state-of-the-art mass spectrometry methods and multivariate machine-learning approaches for analysis of spatial proteomics data. We have also created an open-source infrastructure to support analysis of quantitative mass-spectrometry-based spatial proteomics data (http://bioconductor.org/packages/pRoloc) and an accompanying interactive visualization framework (http://www. bioconductor.org/packages/pRolocGUI). The procedure we outline here is applicable to any cell culture system and requires ∼1 week to complete sample preparation steps, ∼2 d for mass spectrometry data acquisition and 1-2 d for data analysis and downstream informatics.
Quantitative mass-spectrometry-based spatial proteomics involves elaborate, expensive, and time-consuming experimental procedures, and considerable effort is invested in the generation of such data. Multiple research groups have described a variety of approaches for establishing high-quality proteome-wide datasets. However, data analysis is as critical as data production for reliable and insightful biological interpretation, and no consistent and robust solutions have been offered to the community so far. Here, we introduce the requirements for rigorous spatial proteomics data analysis, as well as the statistical machine learning methodologies needed to address them, including supervised and semi-supervised machine learning, clustering, and novelty detection. We present freely available software solutions that implement innovative state-of-the-art analysis pipelines and illustrate the use of these tools through several case studies involving multiple organisms, experimental designs, mass spectrometry platforms, and quantitation techniques. We also propose sound analysis strategies for identifying dynamic changes in subcellular localization by comparing and contrasting data describing different biological conditions. We conclude by discussing future needs and developments in spatial proteomics data analysis. Molecular &
Highlights Protein subcellular localisation is essential for cellular homeostasis. Factors governing protein localisation are poorly understood. Various different methods exist to study this process. Recent studies have captured ever higher resolution localisation information. Orthogonal methods should be used to gain a holistic view of protein localisation.
The cell is compartmentalised into complex micro-environments allowing an array of specialised biological processes to be carried out in synchrony. Determining a protein’s sub-cellular localisation to one or more of these compartments can therefore be a first step in determining its function. High-throughput and high-accuracy mass spectrometry-based sub-cellular proteomic methods can now shed light on the localisation of thousands of proteins at once. Machine learning algorithms are then typically employed to make protein-organelle assignments. However, these algorithms are limited by insufficient and incomplete annotation. We propose a semi-supervised Bayesian approach to novelty detection, allowing the discovery of additional, previously unannotated sub-cellular niches. Inference in our model is performed in a Bayesian framework, allowing us to quantify uncertainty in the allocation of proteins to new sub-cellular niches, as well as in the number of newly discovered compartments. We apply our approach across 10 mass spectrometry based spatial proteomic datasets, representing a diverse range of experimental protocols. Application of our approach to hyperLOPIT datasets validates its utility by recovering enrichment with chromatin-associated proteins without annotation and uncovers sub-nuclear compartmentalisation which was not identified in the original analysis. Moreover, using sub-cellular proteomics data from Saccharomyces cerevisiae, we uncover a novel group of proteins trafficking from the ER to the early Golgi apparatus. Overall, we demonstrate the potential for novelty detection to yield biologically relevant niches that are missed by current approaches.
The cell is compartmentalised into complex micro-environments allowing an array of specialised biological processes to be carried out in synchrony. Determining a protein's sub-cellular localisation to one or more of these compartments can therefore be a rst step in determining its function. High-throughput and high-accuracy mass spectrometry-based sub-cellular proteomic methods can now shed light on the localisation of thousands of proteins at once. Machine learning algorithms are then typically employed to make protein-organelle assignments. However, these algorithms are limited by insucient and incomplete annotation. We propose a semi-supervised Bayesian approach to novelty detection, allowing the discovery of additional, previously unannotated sub-cellular niches. Inference in our model is performed in a Bayesian framework, allowing us to quantify uncertainty in the allocation of proteins to new sub-cellular niches, as well as in the number of newly discovered compartments. We apply our approach across 10 mass spectrometry based spatial proteomic datasets, representing a diverse range of experimental protocols. Application of our approach to hyperLOPIT datasets validates its utility by recovering enrichment with chromatin-associated proteins without annotation and uncovers sub-nuclear compartmentalisation which was not identied in the original analysis. Moreover, using sub-cellular proteomics data from Saccharomyces cerevisiae, we uncover a novel group of proteins tracking from the ER to the early Golgi apparatus. Overall, we demonstrate the potential for novelty detection to yield biologically relevant niches that are missed by current approaches. * ksl23@cam.ac.uk
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.