68Metabolomics data are difficult to find and reuse, even in public repositories. We, therefore, developed the 69Reanalysis of Data User (ReDU) interface (https://redu.ucsd.edu/), a community-and data-driven approach that 70 solves this problem at the repository scale. ReDU enables public data discovery and co-or re-analysis via 71 uniformly formatted, publicly available MS/MS data and metadata in the Global Natural Product Social Molecular 72Networking Platform (GNPS), consistent with findable, accessible, interoperable, and reusable (FAIR) 73principles. 1 74 75 76 Many simple but important questions can be asked using repository-scale public data. For example, what 77 human biospecimen or sampling location is best for detecting a given drug? Or what molecules are found in 78 humans <2 years old? Current metabolomics repositories typically require manual navigation and conversion of 79 thousands of different vendor-formatted files with inconsistent metadata formats, and developing data integration 80 algorithms, greatly complicating analyses. 81
Results and DiscussionReDU addresses FAIR principles by enabling users to find and choose files (Fig 1a). This is possible 82because ReDU formats sample information consistently via a template and drag-and-drop validator backed by 83 standard controlled vocabularies and ontologies (e.g. NCBI taxonomy, 2 UBERON 3, Disease Ontology 4 and MS 84 ontology), and includes geographical location (important for natural products and environmental samples). ReDU 85 automatically uses all public data in the GNPS/MassIVE repository that has the corresponding ReDU-compliant 86 sample information. 34,087 files in GNPS are ReDU-compatible including natural and human-built environments, 87human and animal tissues, biofluids, food, and other data from around the world (Fig 1f), analyzed using different 88 instruments, ionization methods, sample preparation methods, etc. From the 103,230,404 million MS/MS spectra 89 included in ReDU, 4,528,624 spectra were annotated (rate of 4.39% with settings yielding ~1% FDR) as one of 90 13,217 unique chemicals (Table S1). 5,6,7 91 Uniformity of data and sample information in ReDU enables metadata-based and repository-scale 92 analyses ( Fig. 1b-g). Chemical explorer enables selection of a molecule and retrieval of its associations with the 93 metadata, i.e. sample information association. For instance, selecting 12-ketodeoxycholic acid (filtering to 94 include human feces) revealed it was observed after infancy (Fig 1c), whereas cholic acid displayed the opposite 95 trend, coupled to the developing microbiome. Similarly, rosuvastatin was found in adults matching prescription 96 demographics. Another approach enabled is chemical enrichment analysis. For example, human blood, feces, 97 and urine differed by bilirubin, urobilin, and stercobilin (Fig 1d). Bilirubin was more frequently annotated in blood, 98and urobilin and stercobilin were most often annotated in feces. 8 Similarly, comparison of bacterial cultures 99 revealed differences in annotati...