Identifying molecular cancer drivers is critical for precision oncology. Multiple advanced algorithms to identify drivers now exist, but systematic attempts to combine and optimize them on large datasets are few. We report a PanCancer and PanSoftware analysis spanning 9,423 tumor exomes (comprising all 33 of The Cancer Genome Atlas projects) and using 26 computational tools to catalog driver genes and mutations. We identify 299 driver genes with implications regarding their anatomical sites and cancer/cell types. Sequence- and structure-based analyses identified >3,400 putative missense driver mutations supported by multiple lines of evidence. Experimental validation confirmed 60%-85% of predicted mutations as likely drivers. We found that >300 MSI tumors are associated with high PD-1/PD-L1, and 57% of tumors analyzed harbor putative clinically actionable events. Our study represents the most comprehensive discovery of cancer genes and mutations to date and will serve as a blueprint for future biological and clinical endeavors.
The study of cell-population heterogeneity in a range of biological systems, from viruses to bacterial isolates to tumor samples, has been transformed by recent advances in sequencing throughput. While the high-coverage afforded can be used, in principle, to identify very rare variants in a population, existing ad hoc approaches frequently fail to distinguish true variants from sequencing errors. We report a method (LoFreq) that models sequencing run-specific error rates to accurately call variants occurring in <0.05% of a population. Using simulated and real datasets (viral, bacterial and human), we show that LoFreq has near-perfect specificity, with significantly improved sensitivity compared with existing methods and can efficiently analyze deep Illumina sequencing datasets without resorting to approximations or heuristics. We also present experimental validation for LoFreq on two different platforms (Fluidigm and Sequenom) and its application to call rare somatic variants from exome sequencing datasets for gastric cancer. Source code and executables for LoFreq are freely available at http://sourceforge.net/projects/lofreq/.
In metagenome analysis, computational methods for assembly, taxonomic profiling and binning are key components facilitating downstream biological data interpretation. However, a lack of consensus about benchmarking datasets and evaluation metrics complicates proper performance assessment. The Critical Assessment of Metagenome Interpretation (CAMI) challenge has engaged the global developer community to benchmark their programs on datasets of unprecedented complexity and realism. Benchmark metagenomes were generated from ~700 newly sequenced microorganisms and ~600 novel viruses and plasmids, including genomes with varying degrees of relatedness to each other and to publicly available ones and representing common experimental setups. Across all datasets, assembly and genome binning programs performed well for species represented by individual genomes, while performance was substantially affected by the presence of related strains. Taxonomic profiling and binning programs were proficient at high taxonomic ranks, with a notable performance decrease below the family level. Parameter settings substantially impacted performances, underscoring the importance of program reproducibility. While highlighting current challenges in computational metagenomics, the CAMI results provide a roadmap for software selection to answer specific research questions.
Citrus is a large genus that includes several major cultivated species, including C. sinensis (sweet orange), Citrus reticulata (tangerine and mandarin), Citrus limon (lemon), Citrus grandis (pummelo) and Citrus paradisi (grapefruit). In 2009, the global citrus acreage was 9 million hectares and citrus production was 122.3 million tons (FAO statistics, see URLs), which is the top ranked among all the fruit crops. Among the 10.9 million tons (valued at $9.3 billion) of citrus products traded in 2009, sweet orange accounted for approximately 60% of citrus production for both fresh fruit and processed juice consumption (FAO statistics, see URLs). Moreover, citrus fruits and juice are the prime human source of vitamin C, an important component of human nutrition.Citrus fruits also have some unique botanical features, such as nucellar embryony (nucellus cells can develop into apomictic embryos that are genetically identical to mother plant). Consequently, somatic embryos grow much more vigorously than the zygotic embryos in seeds such that seedlings are essentially clones of the maternal parent. Such citrus-unique characteristics have hindered the study of citrus genetics and breeding improvement 1,2 . Complete genome sequences would provide valuable genetic resources for improving citrus crops.Citrus is believed to be native to southeast Asia 3-5 , and cultivation of fruit crops occurred at least 4,000 years ago 3,6 . The genetic origin of the sweet orange is not clear, although there are some speculations that sweet orange might be derived from interspecific hybridization of some primitive citrus species 7,8 . Citrus is also in the order Sapindales, a sister order to the Brassicales in the Malvidae, making it valuable for comparative genomics studies with the model plant Arabidopsis.We aimed to sequence the genome of Valencia sweet orange (C. sinensis cv. Valencia), one of the most important sweet orange varieties cultivated worldwide and grown primarily for orange juice production. Normal sweet oranges are diploids, with nine pairs of chromosomes and an estimated genome size of ~367 Mb 9 . To reduce the complexity of the sequenced genome, we obtained a doublehaploid (dihaploid) line derived from the anther culture of Valencia sweet orange 10 . We first generated whole-genome shotgun pairedend-tag sequence reads from the dihaploid genomic DNA and built a de novo assembly as the citrus reference genome; we then produced shotgun sequencing reads from the parental diploid DNA and mapped the sequences to the haploid reference genome to obtain the complete genome information for Valencia sweet orange. In addition, we conducted comprehensive transcriptome sequencing analyses for four representative tissues using shotgun RNA sequencing (RNA-Seq) to capture all transcribed sequences and paired-end-tag RNA sequencing (RNA-PET) to demarcate the 5′ and 3′ ends of all transcripts. On the basis of the DNA and RNA sequencing data, we characterized the orange genome for its gene content, heterozygosity and evolutionary features. ...
Gastric cancer is a major cause of global cancer mortality. We surveyed the spectrum of somatic alterations in gastric cancer by sequencing the exomes of 15 gastric adenocarcinomas and their matched normal DNAs. Frequently mutated genes in the adenocarcinomas included TP53 (11/15 tumors), PIK3CA (3/15) and ARID1A (3/15). Cell adhesion was the most enriched biological pathway among the frequently mutated genes. A prevalence screening confirmed mutations in FAT4, a cadherin family gene, in 5% of gastric cancers (6/110) and FAT4 genomic deletions in 4% (3/83) of gastric tumors. Frequent mutations in chromatin remodeling genes (ARID1A, MLL3 and MLL) also occurred in 47% of the gastric cancers. We detected ARID1A mutations in 8% of tumors (9/110), which were associated with concurrent PIK3CA mutations and microsatellite instability. In functional assays, we observed both FAT4 and ARID1A to exert tumor-suppressor activity. Somatic inactivation of FAT4 and ARID1A may thus be key tumorigenic events in a subset of gastric cancers.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.