Detection of somatic mutations from tumor and matched normal sequencing data has become a standard approach in cancer research. Although a number of mutation callers have been developed, it is still difficult to detect mutations with low allele frequency even in exome sequencing. We expect that overlapping paired-end read information is effective for this purpose, but no mutation caller has modeled overlapping information statistically in a proper form in exome sequence data. Here, we develop a Bayesian hierarchical method, OVar- Call (https://github.com/takumorizo/OVarCall), where overlapping paired-end read information improves the accuracy of low allele frequency mutation detection. Firstly, we construct two generative models: one is for reads with somatic variants generated from tumor cells and the other is for reads that does not have somatic variants but potentially includes sequence errors. Secondly, we calculate marginal likelihood for each model using a variational Bayesian algorithm to compute Bayes factor for the detection of somatic mutations. We empirically evaluated the performance of OVarCall and confirmed its better performance than other existing methods.
MotivationDetection of somatic mutations from tumor and matched normal sequencing data has become among the most important analysis methods in cancer research. Some existing mutation callers have focused on additional information, e.g. heterozygous single-nucleotide polymorphisms (SNPs) nearby mutation candidates or overlapping paired-end read information. However, existing methods cannot take multiple information sources into account simultaneously. Existing Bayesian hierarchical model-based methods construct two generative models, the tumor model and error model, and limited information sources have been modeled.ResultsWe proposed a Bayesian model integration framework named as partitioning-based model integration. In this framework, through introducing partitions for paired-end reads based on given information sources, we integrate existing generative models and utilize multiple information sources. Based on that, we constructed a novel Bayesian hierarchical model-based method named as OHVarfinDer. In both the tumor model and error model, we introduced partitions for a set of paired-end reads that cover a mutation candidate position, and applied a different generative model for each category of paired-end reads. We demonstrated that our method can utilize both heterozygous SNP information and overlapping paired-end read information effectively in simulation datasets and real datasets.Availability and implementation https://github.com/takumorizo/OHVarfinDer.Supplementary information Supplementary data are available at Bioinformatics online.
Human leukocyte antigen (HLA) genes provide useful information on the relationship between cancer and the immune system. Despite the ease of obtaining these data through nextgeneration sequencing methods, interpretation of these relationships remains challenging owing to the complexity of HLA genes. To resolve this issue, we developed a Bayesian method, ALPHLARD-NT, to identify HLA germline and somatic mutations as well as HLA genotypes from whole-exome sequencing (WES) and whole-genome sequencing (WGS) data. ALPHLARD-NT showed 99.2% accuracy for WGS-based HLA genotyping and detected five HLA somatic mutations in 25 colon cancer cases. In addition, ALPHLARD-NT identified 88 HLA somatic mutations, including recurrent mutations and a novel HLA-B type, from WES data of 343 colon adenocarcinoma cases. These results demonstrate the potential of ALPHLARD-NT for conducting an accurate analysis of HLA genes even from lowcoverage data sets. This method can become an essential tool for comprehensive analyses of HLA genes from WES and WGS data, helping to advance understanding of immune regulation in cancer as well as providing guidance for novel immunotherapy strategies.
We propose a Bayesian method termed MultiMuC for accurate detection of somatic mutations (mutation call) from multi-regional tumor sequence data sets. To improve detection performance, our method is based on the assumption of mutation sharing: if we can predict at least one tumor region has the mutation, then we can be more confident to detect a mutation in more tumor regions by lowering the original threshold of detection. We find two drawbacks in existing methods for leveraging the assumption of mutation sharing. First, existing methods do not consider the probability of the "No-TP (True Positive)" case: we could expect mutation candidates in multiple regions, but actually, no true mutations exist. Second, existing methods cannot leverage scores from other state-of-the-art mutation calling methods for a single-regional tumor. We overcome the first drawback through evaluation of the probability of the No-TP case. Next, we solve the second drawback by the idea of Bayes-factor-based model construction that enables flexible integration of probability-based mutation call scores as building blocks of a Bayesian statistical model. We empirically evaluate that our method steadily improves results from mutation calling methods for a singleregional tumor, e.g., Strelka2 and NeuSomatic, and outperforms existing methods for multi-regional tumors through a real-data-based simulation study. Our implementation of MultiMuC is available at https://github. com/takumorizo/MultiMuC.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.