A full description of the human proteome relies on the challenging task of detecting mature and changing forms of protein molecules in the body. Large scale proteome analysis1 has routinely involved digesting intact proteins followed by inferred protein identification using mass spectrometry (MS)2. This “bottom up” process affords a high number of identifications (not always unique to a single gene). However, complications arise from incomplete or ambiguous2 characterization of alternative splice forms, diverse modifications (e.g., acetylation and methylation), and endogenous protein cleavages, especially when combinations of these create complex patterns of intact protein isoforms and species3. “Top down” interrogation of whole proteins can overcome these problems for individual proteins4,5, but has not been achieved on a proteome scale due to the lack of intact protein fractionation methods that are well integrated with tandem MS. Here we show, using a new four dimensional (4D) separation system, identification of 1,043 gene products from human cells that are dispersed into >3,000 protein species created by post-translational modification, RNA splicing, and proteolysis. The overall system produced >20-fold increases in both separation power and proteome coverage, enabling the identification of proteins up to 105 kilodaltons and those with up to 11 transmembrane helices. Many previously undetected isoforms of endogenous human proteins were mapped, including changes in multiply-modified species in response to accelerated cellular aging (senescence) induced by DNA damage. Integrated with the latest version of the Swiss-Prot database6, the data provide precise correlations to individual genes and proof-of-concept for large scale interrogation of whole protein molecules. The technology promises to improve the link between proteomics data and complex phenotypes in basic biology and disease research7.
The multiple myeloma SET domain (MMSET) protein is overexpressed in multiple myeloma (MM) patients with the translocation t(4;14). Although studies have shown the involvement of MMSET/ Wolf-Hirschhorn syndrome candidate 1 in development, its mode of action in the pathogenesis of MM is largely unknown. We found that MMSET is a major regulator of chromatin structure and transcrip- IntroductionMultiple myeloma is an incurable malignancy of mature plasma cells, associated in approximately 40% of cases with recurrent chromosomal translocations that lead to overexpression of known and putative oncogenes. 1,2 MMSET (WHSC1, NSD2) is linked to the immunoglobulin promoter/enhancer in t(4;14) translocations, found in 15%-20% of multiple myeloma. 3 Chromosomal fusion leads to overexpression of MMSET and FGFR3 genes; however, approximately 30% of the patient samples overexpress only the MMSET gene, suggesting its pivotal role in the disease. [4][5][6] The other nuclear receptor Su(var)3-9, Enhancer-of-zeste, Trithorax (SET) domain-containing (NSD) family members, NSD1 and NSD3, were both found to be rearranged as fusion proteins with NUP98 in rare cases of acute myeloid leukemia, and NSD3 is overexpressed in breast cancer, 7,8 suggesting that deregulation of these proteins plays a causative role in malignancy. The MMSET gene undergoes complex alternative splicing and differential promoter usage, giving rise to a number of different transcripts from the locus, most of which are overexpressed in t(4;14) myelomas ( Figure 1A). 2,9,10 The protein domains found in full-length MMSET include 2 conserved Pro-Trp-Trp-Pro motif (PWWP) domains, 4 plant homeo domain fingers, and 1 SET domain, all of which are commonly found in transcriptional regulators. 11,12 Our previous report suggested that MMSET may be part of a corepressor complex. 13 SET domain-containing proteins can methylate lysine residues on histone tails. 14 Methylation and other covalent modifications of histone tails, such as acetylation, phosphorylation, ubiquitination, or sumoylation, can alter gene expression depending on the residue altered, the type of the modification, and whether the modified histone residue is found in a gene promoter, enhancer, or the body of a gene. 15 Promoters of actively transcribed genes are marked by the presence of H3K4me3, whereas the transcribed body of active genes is characterized by methylation at H3K36 (H3K36me3). 16,17 By contrast, CpG islands are depleted of H3K36 methylation. 18 Inactive and silenced genes show methylation at H3K27me3 and H3K9me3, respectively. 16,17,19 Previous reports suggested promiscuous activity of the SET domains of the NSD family proteins. NSD1 was initially shown to methylate both H3 and H4 histones, and more recently its specificity has been narrowed down to lysine 36 on histone H3. 8 Likewise, MMSET was able to methylate both H3 and H4 histones in vitro. 13,20 A recent report showed that the histone methyl-transferase (HMT) activity of NSD proteins is substrate specific, helping explain these discrepa...
Top-down proteomics has improved over the last decade despite the significant challenges presented by the analysis of large protein ions. Here, the detection of these high mass species by electrospray-based mass spectrometry (MS) is examined from a theoretical perspective to understand the mass-dependent increases in the number of charge states, isotopic peaks, and interfering species present in typical protein mass spectra. Integrating these effects into a quantitative model captures the reduced ability to detect species over 25 kDa with the speed and sensitivity characteristic of proteomics based on <3 kDa peptide ions. The model quantifies the challenge that top-down proteomics faces with respect to current MS instrumentation and projects that depletion of 13C and 15N isotopes can improve detection at high mass by only <2-fold at 100 kDa whereas the effect is up to 5-fold at 10 kDa. Further, we find that supercharging electrosprayed proteins to the point of producing <5 charge states at high mass would improve detection by more than 20 fold.
ProSight PTM 2.0 (http://prosightptm2.scs.uiuc.edu) is the next generation of the ProSight PTM web-based system for the identification and characterization of proteins using top down tandem mass spectrometry. It introduces an entirely new data-driven interface, integrated Sequence Gazer for protein characterization, support for fixed modifications, terminal modifications and improved support for multiple precursor ions (multiplexing). Furthermore, it supports data import and export for local analysis and collaboration.
The results of the challenge showed that the lungs and heart can be segmented fairly accurately by various algorithms, while deep-learning methods performed better on the esophagus. Our dataset together with the manual contours for all training cases continues to be available publicly as an ongoing benchmarking resource.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.