2024
DOI: 10.1101/2024.02.27.582234
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Sequence modeling and design from molecular to genome scale with Evo

Eric Nguyen,
Michael Poli,
Matthew G Durrant
et al.

Abstract: The genome is a sequence that completely encodes the DNA, RNA, and proteins that orchestrate the function of a whole organism. Advances in machine learning combined with massive datasets of whole genomes could enable a biological foundation model that accelerates the mechanistic understanding and generative design of complex molecular interactions. We report Evo, a genomic foundation model that enables prediction and generation tasks from the molecular to genome scale. Using an architecture based on advances i… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
12
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
5
4
1

Relationship

0
10

Authors

Journals

citations
Cited by 43 publications
(12 citation statements)
references
References 115 publications
0
12
0
Order By: Relevance
“…It does, however, suggest that a path forward for the generation of whole genomes that resemble members of existing virus families may require the development of family-specific models. This may also be achieved by prepending taxonomic labels to sequences in the training dataset, as done elsewhere (Nguyen et al, 2024). However, the low data availability for the vast majority of viral families will challenge their development in the short-term.…”
Section: Discussionmentioning
confidence: 99%
“…It does, however, suggest that a path forward for the generation of whole genomes that resemble members of existing virus families may require the development of family-specific models. This may also be achieved by prepending taxonomic labels to sequences in the training dataset, as done elsewhere (Nguyen et al, 2024). However, the low data availability for the vast majority of viral families will challenge their development in the short-term.…”
Section: Discussionmentioning
confidence: 99%
“…While this work has focused on adaptation of PLMs, large protein structure models such as OmegaFold, RosettaFold, ESMFold, and AlphaFold ( 3 , 27 29 ) have begun to see use as foundation models and could be similarly amenable to tuning for downstream tasks with PEFT methods. In addition, it remains to be shown whether PEFT methods are equally competitive for language models trained on other types of biological sequences, such as DNA ( 46 ) or SMILES strings ( 22 ).…”
Section: Discussionmentioning
confidence: 99%
“…reduce changes in promoter activity, regulatory functions, and translation rate and avoid generating unwanted cryptic promoters and translated ORFs. The development of predictive models capable of forecasting transcriptional and translational effects and integrating these predictions to refactor codon composition is expected to minimize the negative fitness impact of synonymous genome recoding [83][84][85][86][87][88][89][90][91][92] .…”
Section: Future Genome Design Projects Should Focus On Developing And...mentioning
confidence: 99%