Endogenous retrovirus (ERV) families are derived from their exogenous counterparts by means of a process of germ-line infection and proliferation within the host genome. Several families in the human and mouse genomes now consist of many hundreds of elements and, although several candidates have been proposed, the mechanism behind this proliferation has remained uncertain. To investigate this mechanism, we reconstructed the ratio of nonsynonymous to synonymous changes and the acquisition of stop codons during the evolution of the human ERV family HERV-K(HML2). We show that all genes, including the env gene, which is necessary only for movement between cells, have been under continuous purifying selection. This finding strongly suggests that the proliferation of this family has been almost entirely due to germ-line reinfection, rather than retrotransposition in cis or complementation in trans, and that an infectious pool of endogenous retroviruses has persisted within the primate lineage throughout the past 30 million years. Because many elements within this pool would have been unfixed, it is possible that the HERV-K(HML2) family still contains infectious elements at present, despite their apparent absence in the human genome sequence. Analysis of the env gene of eight other HERV families indicated that reinfection is likely to be the most common mechanism by which endogenous retroviruses proliferate in their hosts.
BackgroundThe relationship between DNA sequence and encoded information is still an unsolved puzzle. The number of protein-coding genes in higher eukaryotes identified by genome projects is lower than was expected, while a considerable amount of putatively non-coding transcription has been detected. Functional small open reading frames (smORFs) are known to exist in several organisms. However, coding sequence detection methods are biased against detecting such very short open reading frames. Thus, a substantial number of non-canonical coding regions encoding short peptides might await characterization.ResultsUsing bio-informatics methods, we have searched for smORFs of less than 100 amino acids in the putatively non-coding euchromatic DNA of Drosophila melanogaster, and initially identified nearly 600,000 of them. We have studied the pattern of conservation of these smORFs as coding entities between D. melanogaster and Drosophila pseudoobscura, their presence in syntenic and in transcribed regions of the genome, and their ratio of conservative versus non-conservative nucleotide changes. For negative controls, we compared the results with those obtained using random short sequences, while a positive control was provided by smORFs validated by proteomics data.ConclusionsThe combination of these analyses led us to postulate the existence of at least 401 functional smORFs in Drosophila, with the possibility that as many as 4,561 such functional smORFs may exist.
Insertion bias and purifying selection of retrotransposons in the Arabidopsis thaliana genome Genome evolution and size variation in multicellular organisms are profoundly influenced by the activity of retrotransposons. In higher eukaryotes with compact genomes retrotransposons are found in lower copy numbers than in larger genomes, which could be due to either suppression of transposition or to elimination of insertions, and are non-randomly distributed along the chromosomes. The evolutionary mechanisms constraining retrotransposon copy number and chromosomal distribution are still poorly understood.
The correlation coefficient is commonly used as a measure of the divergence of gene expression profiles between different species. Here we point out a potential problem with this statistic: if measurement error is large relative to the differences in expression, the correlation coefficient will tend to show high divergence for genes that have relatively uniform levels of expression across tissues or time points. We show that genes with a conserved uniform pattern of expression have significantly higher levels of expression divergence, when measured using the correlation coefficient, than other genes, in a data set from mouse, rat, and human. We also show that the Euclidean distance yields low estimates of expression divergence for genes with a conserved uniform pattern of expression.
Endogenous retroviruses (ERVs) result from germ line infections by exogenous retroviruses. They can proliferate within the genome of their host species until they are either inactivated by mutation or removed by recombinational deletion. ERVs belong to a diverse group of mobile genetic elements collectively termed transposable elements (TEs). Numerous studies have attempted to elucidate the factors determining the genomic distribution and persistence of TEs. Here we show that, within humans, gene density and not recombination rate correlates with fixation of endogenous retroviruses, whereas the local recombination rate determines their persistence in a full-length state. Recombination does not appear to influence fixation either via the ectopic exchange model or by indirect models based on the efficacy of selection. We propose a model linking rates of meiotic recombination to the probability of recombinational deletion to explain the effect of recombination rate on persistence. Chromosomes 19 and Y are exceptions, possessing more elements than other regions, and we suggest this is due to low gene density and elevated rates of human ERV integration in males for chromosome Y and segmental duplication for chromosome 19.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.