2024
DOI: 10.1101/2024.10.02.616302
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

EvoSeq-ML: Advancing Data-Centric Machine Learning with Evolutionary-Informed Protein Sequence Representation and Generation

Mehrsa Mardikoraem,
Nathaniel Pascual,
Patrick Finneran
et al.

Abstract: In protein engineering, machine learning (ML) advancements have led to significant progress, including protein structure prediction (e.g., AlphaFold), sequence representation through language models, and novel protein generation. However, the impact of data curation on ML model performance is underexplored. As more sequence and structural data become available, a data-centric approach is increasingly favored over a model-centric method. A data-centric approach prioritizes high-quality, domain-specific data, en… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Publication Types

Select...

Relationship

0
0

Authors

Journals

citations
Cited by 0 publications
references
References 43 publications
0
0
0
Order By: Relevance

No citations

Set email alert for when this publication receives citations?