Molecular mechanisms of virus-related diseases involve multiple factors, including viral mutation accumulation and integration of a viral genome into the host DNA. With increasing attention being paid to virus-mediated pathogenesis and the development of many useful technologies to identify virus mutations (VMs) and viral integration sites (VISs), much research on these topics is available in PubMed. However, knowledge of VMs and VISs is widely scattered in numerous published papers which lack standardization, integration and curation. To address these challenges, we built a pilot database of human disease-related Virus Mutations, Integration sites and Cis-effects (ViMIC), which specializes in three features: virus mutation sites, viral integration sites and target genes. In total, the ViMIC provides information on 31 712 VMs entries, 105 624 VISs, 16 310 viral target genes and 1 110 015 virus sequences of eight viruses in 77 human diseases obtained from the public domain. Furthermore, in ViMIC users are allowed to explore the cis-effects of virus-host interactions by surveying 78 histone modifications, binding of 1358 transcription regulators and chromatin accessibility on these VISs. We believe ViMIC will become a valuable resource for the virus research community. The database is available at http://bmtongji.cn/ViMIC/index.php.
Molecular mechanisms of virus-related diseases involve multiple factors, including viral mutation accumulation and integration of a viral genome into the host DNA. With increasing attention being paid to virus-mediated pathogenesis and the development of many useful technologies to identify virus mutations (VMs) and viral integration sites (VISs), abundant literatures on these topics are available in PubMed. However, knowledge of VMs and VISs is widely scattered in numerous published papers, and the association of VMs with VISs in the viral genome or the functional annotation of VISs still lacks integration and curation. To address these challenges, we built a database of human disease-related Virus Mutations, Integration sites and Cis-effects (ViMIC), which specialize in three features: virus mutation sites, viral integration sites and target genes. In total, the ViMIC provides information on 6,461 VMs, 79,089 VISs, and 15,056 viral target genes of 8 viruses in 65 human diseases obtained from literatures. Furthermore, in ViMIC, users are allowed to explore the cis-effects of virus-host interactions by surveying 78 histone modifications, binding of 1,358 transcription regulators, and chromatin accessibility on these VISs. We believe ViMIC will become a valuable resource for the virus research community. The database is available at http://bmtongji.cn/ViMIC/index.php.
Motivation Virus mutation is one of the most important research issues which plays a critical role in disease progression and has prompted substantial scientific publications. Mutation extraction from published literature has become an increasingly important task, benefiting many downstream applications such as vaccine design and drug usage. However, most existing approaches have low performances in extracting virus mutation due to both lack of precise virus mutation information and their development based on human gene mutations. Results We developed ViMRT, a text-mining tool and search engine for automated virus mutation recognition using natural language processing. ViMRT mainly developed 8 optimized rules and 12 regular expressions based on a development dataset comprising 830 papers of 5 human severe disease-related viruses. It achieved higher performance than other tools in a test dataset (1,662 papers, 99.17% in F1-score) and has been applied well to two other viruses, influenza virus and severe acute respiratory syndrome coronavirus-2 (212 papers, 96.99% in F1-score). These results indicate that ViMRT is a high-performance method for the extraction of virus mutation from the biomedical literature. Besides, we present a search engine for researchers to quickly find and accurately search virus mutation-related information including virus genes and related diseases. Availability ViMRT software is freely available at http://bmtongji.cn:1225/mutation/index.
Gastrointestinal (GI) cancer is common, characterized by high mortality, and includes oesophagus, gastric, liver, bile duct, pancreas, rectal and colon cancers. The insufficient specificity and sensitivity of biomarkers is still a key clinical hindrance for GI cancer diagnosis and successful treatment. The emergence of `precision medicine’, `basket trial’ and `field cancerization’ concepts calls for an urgent need and importance for the understanding of how organ system cancers occur at the molecular levels. Knowledge from both the literature and data available in public databases is informative in elucidating the molecular alterations underlying GI cancer. Currently, most available cancer databases have not offered a comprehensive discovery of gene-disease associations, molecular alterations and clinical information by integrated text mining and data mining in GI cancer. We develop GIDB, a panoptic knowledge database that attempts to automate the curation of molecular signatures using natural language processing approaches and multidimensional analyses. GIDB covers information on 8730 genes with both literature and data supporting evidence, 248 miRNAs, 58 lncRNAs, 320 copy number variations, 49 fusion genes and 2381 semantic networks. It presents a comprehensive database, not only in parallelizing supporting evidence and data integration for signatures associated with GI cancer but also in providing the timeline feature of major molecular discoveries. It highlights the most comprehensive overview, research hotspots and the development of historical knowledge of genes in GI cancer. Furthermore, GIDB characterizes genomic abnormalities in multilevel analysis, including simple somatic mutations, gene expression, DNA methylation and prognosis. GIDB offers a user-friendly interface and two customizable online tools (Heatmap and Network) for experimental researchers and clinicians to explore data and help them shorten the learning curve and broaden the scope of knowledge. More importantly, GIDB is an ongoing research project that will continue to be updated and improve the automated method for reducing manual work.
Tumor mutational burden (TMB) is considered a potential biomarker for predicting the response and effect of immune checkpoint inhibitors (ICIs). However, there are still inconsistent standards of gene panels using next-generation sequencing and poor correlation between the TMB genes, immune cell infiltrating, and prognosis. We applied text-mining technology to construct specific TMB-associated gene panels cross various cancer types. As a case exploration, Pearson’s correlation between TMB genes and immune cell infiltrating was further analyzed in colorectal cancer. We then performed LASSO Cox regression to construct a prognosis predictive model and calculated the risk score of each sample for receiver operating characteristic (ROC) analysis. The results showed that the assessment of TMB gene panels performed well with fewer than 500 genes, highly mutated genes, and the inclusion of synonymous mutations and immune regulatory and drug-target genes. Moreover, the analysis of TMB differentially expressed genes (DEGs) suggested that JAKMIP1 was strongly correlated with the gene expression level of CD8+ T cell markers in colorectal cancer. Additionally, the prognosis predictive model based on 19 TMB DEGs reached AUCs of 0.836, 0.818, and 0.787 in 1-, 3-, and 5-year OS models, respectively (C-index: 0.810). In summary, the gene panel performed well and TMB DEGs showed great potential value in immune cell infiltration and in predicting survival.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.