Whole Genome Shotgun (WGS) sequences of plant species often contain an abundance of reads that are derived from the chloroplast genome. Up to now these reads have generally been identified and assembled into chloroplast genomes based on homology to chloroplasts from related species. This re-sequencing approach may select against structural differences between the genomes especially in non-model species for which no close relatives have been sequenced before. The alternative approach is to de novo assemble the chloroplast genome from total genomic DNA sequences. In this study, we used k-mer frequency tables to identify and extract the chloroplast reads from the WGS reads and assemble these using a highly integrated and automated custom pipeline. Our strategy includes steps aimed at optimizing assemblies and filling gaps which are left due to coverage variation in the WGS dataset. We have successfully de novo assembled three complete chloroplast genomes from plant species with a range of nuclear genome sizes to demonstrate the universality of our approach: Solanum lycopersicum (0.9 Gb), Aegilops tauschii (4 Gb) and Paphiopedilum henryanum (25 Gb). We also highlight the need to optimize the choice of k and the amount of data used. This new and cost-effective method for de novo short read assembly will facilitate the study of complete chloroplast genomes with more accurate analyses and inferences, especially in non-model plant genomes.
BackgroundSatellite DNA is a rapidly diverging, largely repetitive DNA component of many eukaryotic genomes. Here we analyse the evolutionary dynamics of a satellite DNA repeat in the genomes of a group of Asian subtropical lady slipper orchids (Paphiopedilum subgenus Parvisepalum and representative species in the other subgenera/sections across the genus). A new satellite repeat in Paphiopedilum subgenus Parvisepalum, SatA, was identified and characterized using the RepeatExplorer pipeline in HiSeq Illumina reads from P. armeniacum (2n = 26). Reconstructed monomers were used to design a satellite-specific fluorescent in situ hybridization (FISH) probe. The data were also analysed within a phylogenetic framework built using the internal transcribed spacer (ITS) sequences of 45S nuclear ribosomal DNA.ResultsSatA comprises c. 14.5% of the P. armeniacum genome and is specific to subgenus Parvisepalum. It is composed of four primary monomers that range from 230 to 359 bp and contains multiple inverted repeat regions with hairpin-loop motifs. A new karyotype of P. vietnamense (2n = 28) is presented and shows that the chromosome number in subgenus Parvisepalum is not conserved at 2n = 26, as previously reported. The physical locations of SatA sequences were visualised on the chromosomes of all seven Paphiopedilum species of subgenus Parvisepalum (2n = 26–28), together with the 5S and 45S rDNA loci using FISH. The SatA repeats were predominantly localisedin the centromeric, peri-centromeric and sub-telocentric chromosome regions, but the exact distribution pattern was species-specific.ConclusionsWe conclude that the newly discovered, highly abundant and rapidly evolving satellite sequence SatA is specific to Paphiopedilum subgenus Parvisepalum. SatA and rDNA chromosomal distributions are characteristic of species, and comparisons between species reveal that the distribution patterns generate a strong phylogenetic signal. We also conclude that the ancestral chromosome number of subgenus Parvisepalum and indeed of all Paphiopedilum could be either 2n = 26 or 28, if P. vietnamense is sister to all species in the subgenus as suggested by the ITS data.Electronic supplementary materialThe online version of this article (10.1186/s12864-018-4956-7) contains supplementary material, which is available to authorized users.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.