Next-generation-sequencing (NGS) has revolutionized the field of genome assembly because of its much higher data throughput and much lower cost compared with traditional Sanger sequencing. However, NGS poses new computational challenges to de novo genome assembly. Among the challenges, GC bias in NGS data is known to aggravate genome assembly. However, it is not clear to what extent GC bias affects genome assembly in general. In this work, we conduct a systematic analysis on the effects of GC bias on genome assembly. Our analyses reveal that GC bias only lowers assembly completeness when the degree of GC bias is above a threshold. At a strong GC bias, the assembly fragmentation due to GC bias can be explained by the low coverage of reads in the GC-poor or GC-rich regions of a genome. This effect is observed for all the assemblers under study. Increasing the total amount of NGS data thus rescues the assembly fragmentation because of GC bias. However, the amount of data needed for a full rescue depends on the distribution of GC contents. Both low and high coverage depths due to GC bias lower the accuracy of assembly. These pieces of information provide guidance toward a better de novo genome assembly in the presence of GC bias.
BackgroundTetraena mongolica (Zygophyllaceae), an endangered endemic species in western Inner Mongolia, China. For endemic species with a limited geographical range and declining populations, historical patterns of demography and hierarchical genetic structure are important for determining population structure, and also provide information for developing effective and sustainable management plans. In this study, we assess genetic variation, population structure, and phylogeography of T. mongolica from eight populations. Furthermore, we evaluate the conservation and management units to provide the information for conservation.ResultsSequence variation and spatial apportionment of the atpB-rbcL noncoding spacer region of the chloroplast DNA were used to reconstruct the phylogeography of T. mongolica. A total of 880 bp was sequenced from eight extant populations throughout the whole range of its distribution. At the cpDNA locus, high levels of genetic differentiation among populations and low levels of genetic variation within populations were detected, indicating that most seed dispersal was restricted within populations.ConclusionsDemographic fluctuations, which led to random losses of genetic polymorphisms from populations, due to frequent flooding of the Yellow River and human disturbance were indicated by the analysis of BEAST skyline plot. Nested clade analysis revealed that restricted gene flow with isolation by distance plus occasional long distance dispersal is the main evolutionary factor affecting the phylogeography and population structure of T. mongolica. For setting a conservation management plan, each population of T. mongolica should be recognized as a conservation unit.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.