2022
DOI: 10.21203/rs.3.rs-1855828/v1
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

ProteinSGM: Score-based generative modeling for de novo protein design

Abstract: Score-based generative models are a novel class of generative models that have shown state-of-the-art sample quality in image synthesis, surpassing the performance of GANs in multiple tasks. Here we present ProteinSGM, a score-based generative model that produces realistic de novo proteins and can inpaint plausible backbones and functional sites into structures of predefined length. With unconditional generation, we show that score-based generative models can generate native-like protein structures, surpassing… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
41
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 21 publications
(41 citation statements)
references
References 2 publications
0
41
0
Order By: Relevance
“…For instance, for the Ornstein-Uhlenbeck process, we can take c = 1/2 and β = Θ(L sc, * √ d) (see e.g. Lemma C.12 from [LLT22a]). The main distinction between Assumption 1 and the assumptions made in previous analyses for score-based generative models is the second half of Part 6 where we assume higher-order smoothness of q ← t .…”
Section: Statement Of Resultsmentioning
confidence: 99%
“…For instance, for the Ornstein-Uhlenbeck process, we can take c = 1/2 and β = Θ(L sc, * √ d) (see e.g. Lemma C.12 from [LLT22a]). The main distinction between Assumption 1 and the assumptions made in previous analyses for score-based generative models is the second half of Part 6 where we assume higher-order smoothness of q ← t .…”
Section: Statement Of Resultsmentioning
confidence: 99%
“…Unconditional protein structure generation Anand's model (Anand & Huang, 2018) Noise PyTorch RamaNet (Sabban & Markovsky, 2020) Noise TF Ig-VAE (Eguchi et al, 2022) Noise PyTorch FoldingDiff (Wu et al, 2022a) Noise PyTorch Protein seqeunce design GraphTrans (Ingraham et al, 2019) 3D Backbone PyTorch GVP (Jing et al, 2020) 3D Backbone PyTorch GCA (Tan et al, 2022) 3D Backbone PyTorch AlphaDesign (Gao et al, 2022a) 3D Backbone PyTorch ESM-IF (Hsu et al, 2022) 3D Backbone PyTorch ProteinMPNN (Dauparas et al, 2022) 3D Backbone PyTorch PiFold (Gao et al, 2022b) 3D Backbone PyTorch Conditional protein design ProteinSGM (Lee & Kim, 2022) Masked structures -Wang's model (Wang et al, 2022a) Functional sites PyTorch SMCDiff (Trippe et al, 2022) Functional motifs -CoordVAE (Lai et al, 2022) Backbone Template -CEM (Fu & Sun, 2022) CDR geometry -Tischer's model (Tischer et al, 2020) Functional motifs TF Anand's model (Anand & Achim, 2022) Multiple conditions -RefineGNN (Jin et al, 2021) Antigen structure PyTorch DiffAb (Luo et al) Antigen structure PyTorch Forward process We start from the standard diffusion process x 0 → x 1 → • • • → x T , where the forward translation kernel from timestamp s to t is defined as q(x t |x s ) = N (x t ; α t|s x s , σ 2 t|s I), s ≤ t. Denote α t = α t|0 , σ t = σ t|0 , and q(x 0 |x 0 ) = N (x 0 ; α 0 x, σ 2 0 I), α 0 = 1, σ 0 = 0. We will show that α t|s = α t /α s , σ 2 t|s = σ 2 t − α 2 t|s σ 2 s .…”
Section: Methods Input Githubmentioning
confidence: 99%
“…Protein Design In addition to small molecules, biomolecules such as proteins have also attracted considerable attention by researchers (Ding et al, 2022;Ovchinnikov & Huang, 2021;Gao et al, 2020;Strokach & Kim, 2022). We divide the mainstream protein design methods into three categories: protein sequence design (Li et al, 2014;Wu et al, 2021;Pearce & Zhang, 2021;Ingraham et al, 2019;Jing et al, 2020;Tan et al, 2022;Gao et al, 2022a;Hsu et al, 2022;Dauparas et al, 2022;Gao et al, 2022b;O'Connell et al, 2018;Wang et al, 2018;Qi & Zhang, 2020;Strokach et al, 2020;Chen et al, 2019;Zhang et al, 2020;Anand & Achim, 2022), unconditional protein structure generation (Anand & Huang, 2018;Sabban & Markovsky, 2020;Eguchi et al, 2022;Wu et al, 2022a), and conditional protein design (Lee & Kim, 2022;Wang et al, 2022a;Trippe et al, 2022;Lai et al, 2022;Fu & Sun, 2022;Tischer et al, 2020;Anand & Achim, 2022;. Protein sequence design aims to discover protein sequences folding into the desired structure, and unconditional protein structure generation focus on generating new protein structures from noisy inputs.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations