2022
DOI: 10.1186/s12859-022-04873-x
|View full text |Cite
|
Sign up to set email alerts
|

TMbed: transmembrane proteins predicted through language model embeddings

Abstract: Background Despite the immense importance of transmembrane proteins (TMP) for molecular biology and medicine, experimental 3D structures for TMPs remain about 4–5 times underrepresented compared to non-TMPs. Today’s top methods such as AlphaFold2 accurately predict 3D structures for many TMPs, but annotating transmembrane regions remains a limiting step for proteome-wide predictions. Results Here, we present TMbed, a novel method inputting embeddi… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
67
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2
2

Relationship

1
6

Authors

Journals

citations
Cited by 46 publications
(68 citation statements)
references
References 91 publications
1
67
0
Order By: Relevance
“…predicting CATH (63) classes (26), LambdaPP will be updated to extend its breadth. All feature prediction methods integrated into the LambdaPP webserver currently use ProtT5 (24) that, in our hands, outperformed ESM-1b (55) and others (4; 9; 25; 24) for numerous applications (38-40; 43; 65; 13; 26; 73). This consistency also increases speed as the generation of embeddings becomes a limiting step.…”
Section: Introductionmentioning
confidence: 76%
See 3 more Smart Citations
“…predicting CATH (63) classes (26), LambdaPP will be updated to extend its breadth. All feature prediction methods integrated into the LambdaPP webserver currently use ProtT5 (24) that, in our hands, outperformed ESM-1b (55) and others (4; 9; 25; 24) for numerous applications (38-40; 43; 65; 13; 26; 73). This consistency also increases speed as the generation of embeddings becomes a limiting step.…”
Section: Introductionmentioning
confidence: 76%
“… Panel A: residue level features: secondary structure, transmembrane topology, disordered residues, small molecule, nucleic or metal binding residues, residue conservation and average variation (24; 40; 43; 13; 30); Panel B: sequence-level features: predicted subcellular localization (65), and an excerpt of predicted GO-annotations (39); Panel C: effect of SAVs (wild-type sequence on x-axis, mutations on y-axis; darker color=higher effect) (43); and Panel D : predicted 3D structure (46). Interactive version at https://embed.predictprotein.org/o/Q9NZC2.…”
Section: Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…In particular, we unconditionally (i.e., without priors on family, function or structure) generated a set of 100,000 protein sequences using ProtGPT2, and predicted secondary structure [22], Gene Ontology (GO) terms [44], residue ability to bind small molecules, nucleotides or metals [47], protein subcellular localization [46], transmembrane topology [41], residue conservation [42], residue disorder [43] and CATH family [45]. Remarkably, this generated a repertoire of 100,000 protein sequences with ∼12 predicted features of structure and function from a single script in approximately 3.5 hours ( Supplement 1 ).…”
Section: Introductionmentioning
confidence: 99%