2021
DOI: 10.1038/s41467-021-26111-3
|View full text |Cite
|
Sign up to set email alerts
|

A proteomics sample metadata representation for multiomics integration and big data analysis

Abstract: The amount of public proteomics data is rapidly increasing but there is no standardized format to describe the sample metadata and their relationship with the dataset files in a way that fully supports their understanding or reanalysis. Here we propose to develop the transcriptomics data format MAGE-TAB into a standard representation for proteomics sample metadata. We implement MAGE-TAB-Proteomics in a crowdsourcing project to manually curate over 200 public datasets. We also describe tools and libraries to va… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
54
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
8
1

Relationship

5
4

Authors

Journals

citations
Cited by 68 publications
(54 citation statements)
references
References 37 publications
0
54
0
Order By: Relevance
“…As also reported in previous studies, one of the major bottlenecks was the curation of dataset metadata, consisting of mapping files to samples and biological conditions. Very recently, the MAGE-TAB-Proteomics format has been developed and formalised to enable the reporting of the experimental design in proteomics experience, including the relationship between samples and raw files, which is recorded in the SDRF-Proteomics section of the file [ 42 ]. Submission of the SDRF-Proteomics files to PRIDE is now supported.…”
Section: Discussionmentioning
confidence: 99%
“…As also reported in previous studies, one of the major bottlenecks was the curation of dataset metadata, consisting of mapping files to samples and biological conditions. Very recently, the MAGE-TAB-Proteomics format has been developed and formalised to enable the reporting of the experimental design in proteomics experience, including the relationship between samples and raw files, which is recorded in the SDRF-Proteomics section of the file [ 42 ]. Submission of the SDRF-Proteomics files to PRIDE is now supported.…”
Section: Discussionmentioning
confidence: 99%
“…), prevents a more streamlined reuse of the available data, especially in the case of reanalyses of quantitative proteomics datasets. The MAGE-TAB for proteomics ( 34 ), an extension of the format original MAGE-TAB format used in transcriptomics ( 35 ), has been recently proposed to capture the sample metadata, and the experimental design for proteomics experiments (Figure 2 ).…”
Section: Current Status Of the Pride Ecosystem: Resources And Toolsmentioning
confidence: 99%
“…The SDRF-Proteomics is a tab-delimited format where each column is a property of the sample or the data file. Each row corresponds to the relation between a sample and a data file, and each cell is the value of the property for the sample or the data file ( 34 ) ( https://github.com/bigbio/proteomics-metadata-standard ).…”
Section: Current Status Of the Pride Ecosystem: Resources And Toolsmentioning
confidence: 99%
“…Mapping raw file names in PRIDE to the samples in the original publication was done manually and it constituted one of the most time-consuming steps in this work. In the context of the activities of the Proteomics Standards Initiative, a standard file format called SDRF-Proteomics (Sample and Data Relationship Format-Proteomics) file (as part of the file format MAGE-TAB-Proteomics) has been formalised recently 34 for capturing the experimental design in proteomics experiments 3 , and we have started working in the related tooling to facilitate the creation of these files. It is important to highlight that submission of SDRF-Proteomics files is already supported by PRIDE, although it is optional at the time of writing.…”
Section: Discussionmentioning
confidence: 99%