New Genomic Signals Underlying the Emergence of Human Proto-Genes

Grandchamp, Anna; Berk, Katrin; Dohmen, Elias; Bornberg‐Bauer, Erich

doi:10.3390/genes13020284

Cited by 13 publications

(12 citation statements)

References 91 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Further analyses indicate that the biased new gene duplicates, expressed in a particular developmental stage, show greater divergence in expression among orthologues and paralogues. The expression analyses also provided new data to support a pattern previously reported in other organisms: new genes show narrower expression patterns across developmental stages or tissues the younger they are, for example, Drosophila [ 6 , 20 ], Oryza [ 21 ], and primates [ 22 ].…”

supporting

confidence: 70%

“…In addition, Grandchamp et al [ 22 ], in this Genes volume, thoroughly examined four properties in human protogenes, i.e., the genes an in early stage of de novo origination [ 29 ]: intron acquisition, regulatory elements, UTRs, and domain evolution. The extensive data were characterized as showing significant differences between protogenes and old genes, revealing a growth process of gene structures with age.…”

mentioning

confidence: 99%

See 1 more Smart Citation

Evolutionary New Genes in a Growing Paradigm

Betrán

Long

2022

Genes

View full text Add to dashboard Cite

show abstract

supporting

confidence: 70%

mentioning

confidence: 99%

Evolutionary New Genes in a Growing Paradigm

Betrán

Long

2022

Genes

View full text Add to dashboard Cite

show abstract

“…To quantify the genomic relationship between the seven lines of D. melanogaster , a dated phylogenetic tree was generated with the software BEAST (Bouckaert et al, 2014). The tree was based on the alignment of the 11,568 longest proteins per genes in common in the seven lines, retrieved from the seven genomes assembled de novo (Grandchamp et al, 2022a), and the date of divergence from European lines to the outgroup Zambian line was set to 12,843 years, following the results of Laurent et al (2011). The Zambian population was well identified as an outgroup population, as expected from its geographic isolation to the European populations (Fig.…”

Section: Resultsmentioning

confidence: 99%

“…To quantify the genomic relationship between the seven lines of D. melanogaster , a dated phylogenetic tree was estimated with the software BEAST (Bouckaert et al, 2014). The tree is based on the alignment of the 11,568 longest proteins per genes in common in the seven lines, retrieved from the seven genomes assembled de novo (Grandchamp et al, 2022a). The branch lengths are based on a previously estimated date of divergence from European lines to the ancestral African population, here approximated by the Zambian line, and set to 12,843 years (Laurent et al, 2011).…”

Section: Resultsmentioning

confidence: 99%

“…For Definition 3, a python script was built to directly assess the overlapping positions and nucleotides of de novo transcripts between lines. To test the accuracy of our Definitions, we used a set of 11,000 proto-genes from Grandchamp et al (2022a), and distributed them into orthogroups by using the software OrthoFinder (Emms and Kelly, 2019), and our script for Definition 1. We found 93% of similarity between the results from OrthoFinder and our script (Orthofinder: 5687 orthogroups, Our script: 6124 orthogroups) ((Supplemental deposit)).…”

Section: Model and Methodsmentioning

confidence: 99%

See 1 more Smart Citation

Quantification and modeling of turnover dynamics ofde novotranscripts inDrosophila melanogaster

Grandchamp

Czuppon

Bornberg‐Bauer

2023

Preprint

Self Cite

View full text Add to dashboard Cite

Most of the transcribed genome in eukaryotes does not code for proteins but produces non-genic transcripts. Among these non-genic transcripts, some are newly transcribed when compared to an evolutionary close outgroup, and are referred to as de novo transcript. Despite their creative role for genomic innovations as potential predecessors of de novo genes, little is known about the rates at which de novo transcripts emerge and disappear. Such a rate estimation requires a precise comparison of the absence and presence of de novo transcripts between phylogenetically close samples, and a mathematical model based on evolutionary processes. To detect newly emerged transcripts on short evolutionary distances, we use DNA long reads and RNA short reads from lines derived from seven populations of Drosophila melanogaster. Transcripts from the seven lines were distributed in orthogroups according to three newly proposed definitions. Overall, each line contains between 2,708 and 3,116 de novo transcripts with most of them being specific to a single line. Depending on the definition of transcript orthogroups, we estimate that between 0.13 and 0.34 transcripts are gained per year and that a transcript is lost at a rate between 6.6 x 10-5 and 2 x 10-4 per year. This suggests frequent exploration of new genomic sequences mediated through a high turnover of transcripts. Our study therefore provides novel insight on non-genic transcript dynamics on a very short evolutionary time-scale with implications for the process of de novo gene birth.

show abstract

Heterologous expression of naturally evolved putative de novo proteins with chaperones

Aubel

Berk

et al. 2022

Protein Science

Self Cite

View full text Add to dashboard Cite

Over the past decade, evidence has accumulated that new protein‐coding genes can emerge de novo from previously non‐coding DNA. Most studies have focused on large scale computational predictions of de novo protein‐coding genes across a wide range of organisms. In contrast, experimental data concerning the folding and function of de novo proteins are scarce. This might be due to difficulties in handling de novo proteins in vitro, as most are short and predicted to be disordered. Here, we propose a guideline for the effective expression of eukaryotic de novo proteins in Escherichia coli. We used 11 sequences from Drosophila melanogaster and 10 from Homo sapiens, that are predicted de novo proteins from former studies, for heterologous expression. The candidate de novo proteins have varying secondary structure and disorder content. Using multiple combinations of purification tags, E. coli expression strains, and chaperone systems, we were able to increase the number of solubly expressed putative de novo proteins from 30% to 62%. Our findings indicate that the best combination for expressing putative de novo proteins in E. coli is a GST‐tag with T7 Express cells and co‐expressed chaperones. We found that, overall, proteins with higher predicted disorder were easier to express. Statement Today, we know that proteins do not only evolve by duplication and divergence of existing proteins but also arise from previously non‐coding DNA. These proteins are called de novo proteins. Their properties are still poorly understood and their experimental analysis faces major obstacles. Here, we aim to present a starting point for soluble expression of de novo proteins with the help of chaperones and thereby enable further characterization.

show abstract

New Genomic Signals Underlying the Emergence of Human Proto-Genes

Cited by 13 publications

References 91 publications

Evolutionary New Genes in a Growing Paradigm

Evolutionary New Genes in a Growing Paradigm

Quantification and modeling of turnover dynamics ofde novotranscripts inDrosophila melanogaster

Heterologous expression of naturally evolved putative de novo proteins with chaperones

Contact Info

Product

Resources

About