Satellite DNAs are among the most abundant repetitive DNAs found in eukaryote genomes, where they participate in a variety of biological roles, from being components of important chromosome structures to gene regulation. Experimental methodologies used before the genomic era were insufficient, too laborious and time-consuming to recover the collection of all satDNAs from a genome. Today, the availability of whole sequenced genomes combined with the development of specific bioinformatic tools are expected to foster the identification of virtually all the “satellitome” of a particular species. While whole genome assemblies are important to obtain a global view of genome organization, most of them are incomplete and lack repetitive regions. We applied short-read sequencing and similarity clustering in order to perform a de novo identification of the most abundant satellite families in two Drosophila species from the virilis group: Drosophila virilis and D. americana, using the Tandem Repeat Analyzer (TAREAN) and RepeatExplorer pipelines. These species were chosen because they have been used as models to understand satDNA biology since the early 70’s. We combined the computational approach with data from the literature and chromosome mapping to obtain an overview of the major tandem repeat sequences of these species. The fact that all of the abundant tandem repeats (TRs) we detected were previously identified in the literature allowed us to evaluate the efficiency of TAREAN in correctly identifying true satDNAs. Our results indicate that raw sequencing reads can be efficiently used to detect satDNAs, but that abundant tandem repeats present in dispersed arrays or associated with transposable elements are frequent false positives. We demonstrate that TAREAN with its parent method RepeatExplorer may be used as resources to detect tandem repeats associated with transposable elements and also to reveal families of dispersed tandem repeats.
34Satellite DNAs are among the most abundant repetitive DNAs found in 35 eukaryote genomes, where they participate in a variety of biological roles, from 36 being components of important chromosome structures to gene regulation. 37 Experimental methodologies used before the genomic era were not sufficient 38 despite being too laborious and time-consuming to recover the collection of all 39 satDNAs from a genome. Today, the availability of whole sequenced genomes 40 combined with the development of specific bioinformatic tools are expected to 41 foster the identification of virtually all of the "satellitome" from a particular 42 species. While whole genome assemblies are important to obtain a global view 43 of genome organization, most assemblies are incomplete and lack repetitive 44 regions. Here, we applied short-read sequencing and similarity clustering in 45 order to perform a de novo identification of the most abundant satellite families 46 in two Drosophila species from the virilis group: Drosophila virilis and D. 47 americana. These species were chosen because they have been used as a 48 model to understand satDNA biology since early 70's. We combined 49 computational tandem repeat detection via similarity-based read clustering 50 (implemented in Tandem Repeat Analyzer pipeline -"TAREAN") with data from 51 the literature and chromosome mapping to obtain an overview of satDNAs in D. 52 virilis and D. americana. The fact that all of the abundant tandem repeats we 53 detected were previously identified in the literature allowed us to evaluate the 54 efficiency of TAREAN in correctly identifying true satDNAs. Our results indicate 55that raw sequencing reads can be efficiently used to detect satDNAs, but that 56 abundant tandem repeats present in dispersed arrays or associated with 57 transposable elements are frequent false positives. We demonstrate that 3 58 TAREAN with its parent method RepeatExplorer, may be used as resources to 59 detect tandem repeats associated with transposable elements and also to 60 reveal families of dispersed tandem repeats. 61 62 Introduction 63 The genome of eukaryotes encloses a variety of repetitive DNA sequences 64 which comprises most of the nuclear DNA of several organisms, including 65 animals, plants and insects [1,2]. Among them are the satellite DNAs 66 (satDNAs), usually defined as abundant, tandemly repeated noncoding DNA 67 sequences, forming large arrays (hundreds of kilobases up to megabases), 68 typically located in the heterochromatic regions of the chromosomes [3,4], 69although short arrays may additionally be present in the euchromatin [5,6]. 70The collection of satDNAs in the genome, also known as the "satellitome", 71 usually represents a significant fraction (>30%) of several animal and plant 72 genomes. Other classes of noncoding tandem repeats include the 73 microsatellites, with repeat units less than 10 bp long, array sizes around 100 74 bp and scattered distributed throughout the genome; and the minisatellites, with 75 repeats between 10 to 100 bp...
Satellite DNA (satDNA) is a class of tandemly repeated non-protein coding DNA sequences which can be found in abundance in eukaryotic genomes. They can be functional, impact the genomic architecture in many ways, and their rapid evolution has consequences for species diversification. We took advantage of the recent availability of sequenced genomes from 23 Drosophila species from the montium group to study their satDNA landscape. For this purpose, we used publicly available whole-genome sequencing Illumina reads and the TAREAN (tandem repeat analyzer) pipeline. We provide the characterization of 101 non-homologous satDNA families in this group, 93 of which are described here for the first time. Their repeat units vary in size from 4 bp to 1897 bp, but most satDNAs show repeat units < 100 bp long and, among them, repeats ≤ 10 bp are the most frequent ones. The genomic contribution of the satDNAs ranges from ~1.4% to 21.6%. There is no significant correlation between satDNA content and genome sizes in the 23 species. We also found that at least one satDNA originated from an expansion of the central tandem repeats (CTRs) present inside a Helitron transposon. Finally, some satDNAs may be useful as taxonomic markers for the identification of species or subgroups within the group.
The "cut-and-paste" P-element present in some Diptera illustrates two important transposable elements abilities: to move within genomes and to be transmitted between non-mating species, a phenomenon known as horizontal transposon transfer (HTT). Recent studies reported a HTT of the P-element from Drosophila melanogaster to D. simulans. P-elements first appeared in D. simulans European samples collected in 2006 and spread across several populations from Europe, Africa, North America and Japan within seven years. Nevertheless, no P-element was found in South American populations of D. simulans collected between 2002 and 2009. We investigated the presence of the P-element in D. simulans collected in five Brazilian localities between 2018 and 2019, using a combination of methodologies such as PCR, DNA sequencing and FISH on chromosomes. Our experiments revealed the presence of the P-element in all sampled individuals from the five localities. The number of P-elements per individual varied from 11 to 20 copies and truncated copies were also observed. Altogether, our results showed that P-element invasion in D. simulans is at an advanced stage in Brazil and, together with other recent studies, confirms the remarkable rapid invasion of P-elements across worldwide D. simulans populations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.