Knowledge of the exact distribution of meiotic crossovers (COs) and gene conversions (GCs) is essential for understanding many aspects of population genetics and evolution, from haplotype structure and long-distance genetic linkage to the generation of new allelic variants of genes. To this end, we resequenced the four products of 13 meiotic tetrads along with 10 doubled haploids derived from Arabidopsis thaliana hybrids. GC detection through short reads has previously been confounded by genomic rearrangements. Rigid filtering for misaligned reads allowed GC identification at high accuracy and revealed an ∼80-kb transposition, which undergoes copy-number changes mediated by meiotic recombination. Non-crossover associated GCs were extremely rare most likely due to their short average length of ∼25–50 bp, which is significantly shorter than the length of CO-associated GCs. Overall, recombination preferentially targeted non-methylated nucleosome-free regions at gene promoters, which showed significant enrichment of two sequence motifs.DOI: http://dx.doi.org/10.7554/eLife.01426.001
Resequencing or reference-based assemblies reveal large parts of the small-scale sequence variation. However, they typically fail to separate such local variation into colinear and rearranged variation, because they usually do not recover the complement of large-scale rearrangements, including transpositions and inversions. Besides the availability of hundreds of genomes of diverse Arabidopsis thaliana accessions, there is so far only one full-length assembled genome: the reference sequence. We have assembled 117 Mb of the A. thaliana Landsberg erecta (Ler) genome into five chromosome-equivalent sequences using a combination of short Illumina reads, long PacBio reads, and linkage information. Whole-genome comparison against the reference sequence revealed 564 transpositions and 47 inversions comprising ∼3.6 Mb, in addition to 4.1 Mb of nonreference sequence, mostly originating from duplications. Although rearranged regions are not different in local divergence from colinear regions, they are drastically depleted for meiotic recombination in heterozygotes. Using a 1.2-Mb inversion as an example, we show that such rearrangement-mediated reduction of meiotic recombination can lead to genetically isolated haplotypes in the worldwide population of A. thaliana. Moreover, we found 105 single-copy genes, which were only present in the reference sequence or the Ler assembly, and 334 single-copy orthologs, which showed an additional copy in only one of the genomes. To our knowledge, this work gives first insights into the degree and type of variation, which will be revealed once complete assemblies will replace resequencing or other reference-dependent methods.
The vast majority of cancer next-generation sequencing data consist of bulk samples composed of mixtures of cancer and normal cells. To study tumor evolution, subclonal reconstruction approaches based on machine learning are used to separate subpopulation of cancer cells and reconstruct their ancestral relationships. However, current approaches are entirely data-driven and agnostic to evolutionary theory. We demonstrate that systematic errors occur in subclonal reconstruction if tumor evolution is not accounted for, and that those errors increase when multiple samples are taken from the same tumor. To address this issue, we present a novel approach for model-based subclonal reconstruction that combines data-driven machine learning with evolutionary theory. Using public, synthetic and newly generated data, we show the method is more robust and accurate than current techniques in both single-sample and multi-region sequencing data. With careful data curation and interpretation, we show how the method allows minimizing the confounding factors that affect non-evolutionary methods, leading to a more accurate recovery of the evolutionary history of human tumors..
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.