The rapid development of proteomics studies has resulted in large volumes of experimental data. The emergence of big data platform provides the opportunity to handle these large amounts of data. The integrated proteome resource, iProX (https://www.iprox.cn), which was initiated in 2017, has been greatly improved with an up-to-date big data platform implemented in 2021. Here, we describe the main iProX developments since its first publication in Nucleic Acids Research in 2019. First, a hyper-converged architecture with high scalability supports the submission process. A hadoop cluster can store large amounts of proteomics datasets, and a distributed, RESTful-styled Elastic Search engine can query millions of records within one second. Also, several new features, including the Universal Spectrum Identifier (USI) mechanism proposed by ProteomeXchange, RESTful Web Service API, and a high-efficiency reanalysis pipeline, have been added to iProX for better open data sharing. By the end of August 2021, 1526 datasets had been submitted to iProX, reaching a total data volume of 92.42TB. With the implementation of the big data platform, iProX can support PB-level data storage, hundreds of billions of spectra records, and second-level latency service capabilities that meet the requirements of the fast growing field of proteomics.
In light of the rapid accumulation of large-scale omics datasets, numerous studies have attempted to characterize the molecular and clinical features of cancers from a multi-omics perspective. However, there are great challenges in integrating multi-omics using machine learning methods for cancer subtype classification. In this study, MoGCN, a multi-omics integration model based on graph convolutional network (GCN) was developed for cancer subtype classification and analysis. Genomics, transcriptomics and proteomics datasets for 511 breast invasive carcinoma (BRCA) samples were downloaded from the Cancer Genome Atlas (TCGA). The autoencoder (AE) and the similarity network fusion (SNF) methods were used to reduce dimensionality and construct the patient similarity network (PSN), respectively. Then the vector features and the PSN were input into the GCN for training and testing. Feature extraction and network visualization were used for further biological knowledge discovery and subtype classification. In the analysis of multi-dimensional omics data of the BRCA samples in TCGA, MoGCN achieved the highest accuracy in cancer subtype classification compared with several popular algorithms. Moreover, MoGCN can extract the most significant features of each omics layer and provide candidate functional molecules for further analysis of their biological effects. And network visualization showed that MoGCN could make clinically intuitive diagnosis. The generality of MoGCN was proven on the TCGA pan-kidney cancer datasets. MoGCN and datasets are public available at https://github.com/Lifoof/MoGCN. Our study shows that MoGCN performs well for heterogeneous data integration and the interpretability of classification results, which confers great potential for applications in biomarker identification and clinical diagnosis.
The ubiquitin-like protein FAT10 (HLA-F adjacent transcript 10) is uniquely expressed in mammals. The fat10 gene is encoded in the MHC class I locus in the human genome and is related to some specific processes, such as apoptosis, immune response, and cancer. However, biological knowledge of FAT10 is limited, owing to the lack of identification of its conjugates. FAT10 covalently modifies proteins in eukaryotes, but only a few substrates of FAT10 have been reported until now, and no FATylated sites have been identified. Here, we report the proteome-scale identification of FATylated proteins by liquid chromatography coupled with tandem mass spectrometry (LC-MS/MS). We identified 175 proteins with high confidence as FATylated candidates. A total of 13 modified sites were identified for the first time by a modified search of the raw MS data. The modified sites were highly enriched with hydrophilic amino acids. Furthermore, the FATylation processes of hnRNP C2, PCNA, and PDIA3 were verified by a coimmunoprecipitation assay. We confirmed that most of the substrates were covalently attached to a FAT10 monomer. The functional distribution of the FAT10 targets suggests that FAT10 participates in various biological processes, such as translation, protein folding, RNA processing, and macromolecular complex assembly. These results should be very useful for investigating the biological functions of FAT10.
LiverAtlas is the most comprehensive liver and hepatic disease resource, which helps biologists and clinicians to analyse their data at the systems level and will contribute much to the biomarker discovery and diagnostic performance enhancement for liver diseases.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.