BackgroundProteogenomics integrates genomics, transcriptomics and mass spectrometry (MS)-based proteomics data to identify novel protein sequences arising from gene and transcript sequence variants. Proteogenomic data analysis requires integration of disparate 'omic software tools, as well as customized tools to view and interpret results. The flexible Galaxy platform has proven valuable for proteogenomic data analysis. Here, we describe a novel Multi-omics Visualization Platform (MVP) for organizing, visualizing and exploring proteogenomic results, adding a critically needed tool for data exploration and interpretation. Findings MVP is built as an HTML Galaxy plugin, primarily based on Javascript. Via the Galaxy API, MVP uses SQlite databases as input --a custom datatype (mzSQlite) containing MS-based peptide identification information, a variant annotation table, and a coding sequence table. Users can interactively filter identified peptides based on sequence and data quality metrics, view annotated peptide MS data, visualize protein-level information, along with genomic coordinates. Peptides that pass the user-defined thresholds can be sent back to Galaxy via the API for further analysis; processed data and visualizations can also be saved and shared. MVP leverages the Integrated Genomics Viewer JavaScript (IGVjs) framework, enabling interactive visualization of peptides and corresponding transcript and genomic coding information within the MVP interface.Conclusions MVP provides a powerful, extensible platform for automated, interactive visualization of proteogenomics results within the Galaxy environment, adding a unique and critically needed tool for empowering exploration and interpretation of results. The platform is extensible, providing a basis for further development of new functionalities for proteogenomic data visualization.
FindingsProteogenomics has emerged as a powerful approach to characterizing expressed protein products within a wide-variety of studies [1][2][3][4][5]. Proteogenomics, a multi-omic approach, involves the integration of genomic and/or transcriptomic data with mass spectrometry (MS)-based proteomics data. Typically, a proteogenomics-based study starts with a sample (e.g. cells grown in culture, tissue sample etc.) which are analyzed using both next generation sequencing technologies (usually RNA-Seq) and MS-based proteomics. Once assembled from RNA-Seq data, the transcriptome sequence is translated in-silico to generate a database of potentially expressed proteins encoded by the RNA. This protein sequence database contains both proteins of known sequences contained in reference databases, as well as novel protein sequences which are derived from the transcriptome sequence via comparison to reference genome sequence. These novel sequences may include variants arising from single-amino acid substitutions, short insertions/deletions, RNA processing events (truncations, splice variants) or even translation from unexpected genomic regions [2].Parallel to the RNA-Seq analysis, tandem mas...