Background Representing biological networks as graphs is a powerful approach to reveal underlying patterns, signatures, and critical components from high-throughput biomolecular data. However, graphs do not natively capture the multi-way relationships present among genes and proteins in biological systems. Hypergraphs are generalizations of graphs that naturally model multi-way relationships and have shown promise in modeling systems such as protein complexes and metabolic reactions. In this paper we seek to understand how hypergraphs can more faithfully identify, and potentially predict, important genes based on complex relationships inferred from genomic expression data sets. Results We compiled a novel data set of transcriptional host response to pathogenic viral infections and formulated relationships between genes as a hypergraph where hyperedges represent significantly perturbed genes, and vertices represent individual biological samples with specific experimental conditions. We find that hypergraph betweenness centrality is a superior method for identification of genes important to viral response when compared with graph centrality. Conclusions Our results demonstrate the utility of using hypergraphs to represent complex biological systems and highlight central important responses in common to a variety of highly pathogenic viruses.
Despite high sequence similarity between pandemic and seasonal influenza viruses, there is extreme variation in host pathogenicity from one viral strain to the next. Identifying the underlying mechanisms of variability in pathogenicity is a critical task for understanding influenza virus infection and effective management of highly pathogenic influenza virus disease. We applied a network-based modeling approach to identify critical functions related to influenza virus pathogenicity using large transcriptomic and proteomic datasets from mice infected with six influenza virus strains or mutants. Our analysis revealed two pathogenicity-related gene expression clusters; these results were corroborated by matching proteomics data. We also identified parallel downstream processes that were altered during influenza pathogenesis. We found that network bottlenecks (nodes that bridge different network regions) were highly enriched in pathogenicity-related genes, while network hubs (highly connected network nodes) were significantly depleted in these genes. We confirmed that this trend persisted in a distinct virus: Severe Acute Respiratory Syndrome Coronavirus (SARS). The role of epidermal growth factor receptor (EGFR) in influenza pathogenesis, one of the bottleneck regulators with corroborating signals across transcript and protein expression data, was tested and validated in additional mouse infection experiments. We demonstrate that EGFR is important during influenza infection, but the role it plays changes for lethal versus non-lethal infections. Our results show that by using association networks, bottleneck genes that lack hub characteristics can be used to predict a gene’s involvement in influenza virus pathogenicity. We also demonstrate the utility of employing multiple network approaches for analyzing host response data from viral infections.
Bottom-up proteomics is increasingly being used to characterize unknown environmental, clinical, and forensic samples. Proteomics-based bacterial identification typically proceeds by tabulating peptide "hits" (i.e., confidently identified peptides) associated with the organisms in a database; those organisms with enough hits are declared present in the sample. This approach has proven to be successful in laboratory studies; however, important research gaps remain. First, the common-practice reliance on unique peptides for identification is susceptible to a phenomenon known as signal erosion. Second, no general guidelines are available for determining how many hits are needed to make a confident identification. These gaps inhibit the transition of this approach to real-world forensic samples where conditions vary and large databases may be needed. In this work, we propose statistical criteria that overcome the problem of signal erosion and can be applied regardless of the sample quality or data analysis pipeline. These criteria are straightforward, producing a p-value on the result of an organism or toxin identification. We test the proposed criteria on 919 LC-MS/MS data sets originating from 2 toxins and 32 bacterial strains acquired using multiple data collection platforms. Results reveal a > 95% correct species-level identification rate, demonstrating the effectiveness and robustness of proteomics-based organism/toxin identification.
Robust and highly specific methods for the detection of the protein toxin ricin are of interest to the law enforcement community. In previous studies, methods based on liquid chromatography−tandem mass spectrometry shotgun proteomics have been proposed. The successful implementation of this approach relies on specific data evaluation criteria addressing (1) the quality of the mass spectrometric data, (2) the confidence of peptide identifications (peptide-spectrum matches), and (3) the number and sequence specificity of peptides detected. We present such data evaluation criteria and use a novel approach to establish the limit of detection for this ricin assay. Specifically, we use logistic regression to determine the probability of detection for individual ricin peptides at different concentrations. We then apply basic rules from probability theory, combining these individual peptide probabilities into an overall assay limit of detection. This procedure yields an assay limit of detection for ricin at 42.5 ng on column or 21.25 ng/μL for a 2-μL injection. We also show that, despite the conventional wisdom that detergents are deleterious to mass spectrometric analyses, the presence of Tween-20 did not prevent detection of ricin peptides, and indeed assays performed in buffers that included Tween-20 gave better results than assays performed using other buffer formulations with or without detergent removal.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.