Cell culture systems allow key insights into biological mechanisms yet suffer from irreproducible outcomes in part because of cross-contamination or mislabelling of cell lines. Cell line misidentification can be mitigated by the use of genotyping protocols, which have been developed for human cell lines but are lacking for many important model species. Here we leverage the classical observation that transposable elements (TEs) proliferate in cultured Drosophila cells to demonstrate that genome-wide TE insertion profiles can reveal the identity and provenance of Drosophila cell lines. We identify multiple cases where TE profiles clarify the origin of Drosophila cell lines (Sg4, mbn2, and OSS_E) relative to published reports, and also provide evidence that insertions from only a subset of LTR retrotransposon families are necessary to mark Drosophila cell line identity. We also develop a new bioinformatics approach to detect TE insertions and estimate intra-sample allele frequencies in legacy whole-genome sequencing data (called ngs_te_mapper2), which revealed loss of heterozygosity as a mechanism shaping the unique TE profiles that identify Drosophila cell lines. Our work contributes to the general understanding of the forces impacting metazoan genomes as they evolve in cell culture and paves the way for high-throughput protocols that use TE insertions to authenticate cell lines in Drosophila and other organisms.
The Drosophila melanogaster P transposable element provides one of the best cases of horizontal transfer of a mobile DNA sequence in eukaryotes. Invasion of natural populations by the P element has led to a syndrome of phenotypes known as P-M hybrid dysgenesis that emerges when strains differing in their P element composition mate and produce offspring. Despite extensive research on many aspects of P element biology, many questions remain about the genomic basis of variation in P-M dysgenesis phenotypes across populations. Here we compare estimates of genomic P element content with gonadal dysgenesis phenotypes for isofemale strains obtained from three worldwide populations of D. melanogaster to illuminate the molecular basis of natural variation in cytotype status. We show that P element abundance estimated from genome sequences of isofemale strains is highly correlated across different bioinformatics approaches, but that abundance estimates are sensitive to method and filtering strategies as well as incomplete inbreeding of isofemale strains. We find that P element content varies significantly across populations, with strains from a North American population having fewer P elements but a higher proportion of full-length elements than strains from populations sampled in Europe or Africa. Despite these geographic differences in P element abundance and structure, neither the number of P elements nor the ratio of full-length to internally-truncated copies is strongly correlated with the degree of gonadal dysgenesis exhibited by an isofemale strain. Thus, variation in P element abundance and structure across different populations does not necessarily lead to corresponding geographic differences in gonadal dysgenesis phenotypes. Finally, we confirm that population differences in the abundance and structure of P elements that are observed from isofemale lines can also be observed in pool-seq samples from the same populations. Our work supports the view that genomic P element content alone is not sufficient to explain variation in gonadal dysgenesis across strains of D. melanogaster, and informs future efforts to decode the genomic basis of geographic and temporal differences in P element induced phenotypes.
Animal cell lines often undergo extreme genome restructuring events, including polyploidy and segmental aneuploidy that can impede de novo whole-genome assembly (WGA). In some species like Drosophila, cell lines also exhibit massive proliferation of transposable elements (TEs). To better understand the role of transposition during animal cell culture, we sequenced the genome of the tetraploid Drosophila S2R+ cell line using long-read and linked-read technologies. WGAs for S2R+ were highly fragmented and generated variable estimates of TE content across sequencing and assembly technologies. We therefore developed a novel WGA-independent bioinformatics method called TELR that identifies, locally assembles, and estimates allele frequency of TEs from long-read sequence data (https://github.com/bergmanlab/telr). Application of TELR to a ∼130x PacBio dataset for S2R+ revealed many haplotype-specific TE insertions that arose by transposition after initial cell line establishment and subsequent tetraploidization. Local assemblies from TELR also allowed phylogenetic analysis of paralogous TEs, which revealed that proliferation of TE families in vitro can be driven by single or multiple source lineages. Our work provides a model for the analysis of TEs in complex heterozygous or polyploid genomes that are recalcitrant to WGA and yields new insights into the mechanisms of genome evolution in animal cell culture.
Animal cell lines cultured for extended periods often undergo extreme genome restructuring events, including polyploidy and segmental aneuploidy that can impede de novo whole-genome assembly (WGA). In Drosophila, many established cell lines also exhibit massive proliferation of transposable elements (TEs) relative to wild-type flies. To better understand the role of transposition during long-term animal somatic cell culture, we sequenced the genome of the tetraploid Drosophila S2R+ cell line using long-read and linked-read technologies. Relative to comparable data from inbred whole flies, WGAs for S2R+ were highly fragmented and generated variable estimates of TE content across sequencing and assembly technologies. We therefore developed a novel WGA-independent bioinformatics method called "TELR" that identifies, locally assembles, and estimates allele frequency of TEs from long-read sequence data (https://github.com/bergmanlab/telr). Application of TELR to a ~130x PacBio dataset for S2R+ revealed many haplotype-specific TE insertions that arose by somatic transposition in cell culture after initial cell line establishment and subsequent tetraploidization. Local assemblies from TELR also allowed phylogenetic analysis of paralogous TE copies within the S2R+ genome, which revealed that proliferation of different TE families during cell line evolution in vitro can be driven by single or multiple source lineages. Our work provides a model for the analysis of TEs in complex heterozygous or polyploid genomes that are not amenable to WGA and yields new insights into the mechanisms of genome evolution in animal cell culture.
Cultured cells are widely used in molecular biology despite poor understanding of how cell line genomes change in vitro over time. Previous work has shown that Drosophila cultured cells have a higher transposable element (TE) content than whole flies, but whether this increase in TE content resulted from an initial burst of transposition during cell line establishment or ongoing transposition in cell culture remains unclear. Here we sequence the genomes of 25 sub-lines of Drosophila S2 cells and show that TE insertions provide abundant markers for the phylogenetic reconstruction of diverse sub-lines in a model animal cell culture system. DNA copy number evolution across S2 sub-lines revealed dramatically different patterns of genome organization that support the overall evolutionary history reconstructed using TE insertions. Analysis of TE insertion site occupancy and ancestral states support a model of ongoing transposition dominated by episodic activity of a small number of retrotransposon families. Our work demonstrates that substantial genome evolution occurs during long-term Drosophila cell culture, which may impact the reproducibility of experiments that do not control for sub-line identity.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.