2022
DOI: 10.48550/arxiv.2207.13921
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

HelixFold-Single: MSA-free Protein Structure Prediction by Using Protein Language Model as an Alternative

Abstract: AI-based protein structure prediction pipelines, such as AlphaFold2, have achieved near-experimental accuracy. These advanced pipelines mainly rely on Multiple Sequence Alignments (MSAs) and templates as inputs to learn the co-evolution information from the homologous sequences. Nonetheless, searching MSAs and templates from protein databases is time-consuming, usually taking dozens of minutes. Consequently, we attempt to explore the limits of fast protein structure prediction by using only primary sequences o… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
21
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 18 publications
(21 citation statements)
references
References 21 publications
0
21
0
Order By: Relevance
“…Evaluation. Baseline methods include antibody-specific structure prediction (ABodyBuilder [19], DeepAb [34], ABlooper [1], NanoNet [8] and IgFold [31]) and general protein structure prediction, either MSA-based (AlphaFold [15] and AlphaFold-Multimer [10]) or MSA-free (HelixFold-Single [11], ESMFold [20], and OmegaFold [40]).…”
Section: Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…Evaluation. Baseline methods include antibody-specific structure prediction (ABodyBuilder [19], DeepAb [34], ABlooper [1], NanoNet [8] and IgFold [31]) and general protein structure prediction, either MSA-based (AlphaFold [15] and AlphaFold-Multimer [10]) or MSA-free (HelixFold-Single [11], ESMFold [20], and OmegaFold [40]).…”
Section: Resultsmentioning
confidence: 99%
“…These models mainly use a attention-based deep neural network to capture long-range inter-residue relationship and coevolutionary information encoded in the sequence. Previous work [9,28,23,20] has shown that with small-scale supervised training for downstream tasks, PLMs can capture some functional and structure properties of proteins, including secondary structure, binding residues [21], tertiary contact and protein structure [20,40,11]. Some work [29] also extends the PLMs to model a set of aligned sequences in a MSA using axial attention.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…without MSAs or templates (model_5_ptm zeroing and masking MSA features). We compare to recent pLM-based models trained for protein structure prediction from sequence alone -specifically OmegaFold [5] and HelixFold-Single [13], in both cases using the official publicly available code and checkpoints. The recent ESMFold [12] model (which is a minimally modified AlphaFold Evoformer and Structure Module operating on ESM-2 embeddings) does not yet have a public release, and so we cannot report a direct comparison given the different validation sets between their paper and ours.…”
Section: Datasetsmentioning
confidence: 99%
“…An appealing approach is therefore to replace explicit representations of external sequence/structural databases (i.e. MSAs/templates) with pLM embeddings, which has been the foundation of recent models that predict 3D structures from protein sequence alone [5,12,13].…”
Section: Introductionmentioning
confidence: 99%