2021
DOI: 10.1002/cpz1.113
|View full text |Cite
|
Sign up to set email alerts
|

Learned Embeddings from Deep Learning to Visualize and Predict Protein Sets

Abstract: If you already have a Python installation with a different version (e.g., 2.7) that you must keep, consider installing Python 3.8 through Anaconda ("Anaconda Software Distribution," 2020): https:// docs.anaconda.com/ anaconda/ install. Download required files.Through your browser, navigate to http:// data.bioembeddings.com/ disprot and download the files: sequences.fasta, config.yml, and dis-prot_annotations.csv.Note that you might need to right click and select "Save Link As" to download the files.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
76
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2
2

Relationship

3
5

Authors

Journals

citations
Cited by 88 publications
(81 citation statements)
references
References 68 publications
0
76
0
Order By: Relevance
“…The data set including predictions for the human proteome, the source code, and the trained model are available via GitHub (https://github.com/Rostlab/bindPredict). Embeddings can be generated using the bio_embeddings pipeline 38 . In addition, bindEmbed21DL is publicly available as a standalone method as part of bio_embeddings.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…The data set including predictions for the human proteome, the source code, and the trained model are available via GitHub (https://github.com/Rostlab/bindPredict). Embeddings can be generated using the bio_embeddings pipeline 38 . In addition, bindEmbed21DL is publicly available as a standalone method as part of bio_embeddings.…”
Section: Resultsmentioning
confidence: 99%
“…All data, the source code, and the trained model are available via GitHub (https://github.com/Rostlab/bindPredict). Embeddings can be generated using the bio_embeddings pipeline 40 . In addition, bindEmbed21 and its components bindEmbed21DL and bindEmbed21HBI are publicly available through bio_embeddings.…”
Section: Littmann Et Al and B Rostmentioning
confidence: 99%
“…Machine learning models have been used to solve biological problems such as predicting solubility of proteins, targeting subcellular localizations, folding and more [58][59][60][61][62][63]. In this paper, we develop a machine learning model that utilizes protein sequence information, which can classify residues in mechanically stable and unstable substructures.…”
Section: Discussionmentioning
confidence: 99%
“…All newly developed prediction methods exclusively used embeddings from pretrained protein LMs, namely from ProtBert (Elnaggar et al 2021) based on the NLP (Natural Language Processing) algorithm BERT (Devlin et al 2019) trained on the BFD database with over 2.3 million protein sequences (Steinegger and Söding 2018), and ProtT5-XL-U50 (Elnaggar et al 2021) (for simplicity referred to as ProtT5) based on the NLP method T5 (Raffel et al 2020) trained on BFD and fine-tuned on Uniref50 (The UniProt Consortium 2021). All embeddings were obtained from the bio_embeddings pipeline (Dallago et al 2021). The per-residue embeddings were extracted from the last hidden layer of the models with size 1024×L, where L is the length of the protein sequence and 1024 is the size of the embedding space of ProtBert and ProtT5.…”
Section: Methodsmentioning
confidence: 99%
“…Embeddings from pLMs. For conservation prediction, we used embeddings from the following pLMs: (Dallago et al 2021). As described in ProtTrans, only the encoder-side of ProtT5 was used and embeddings were extracted in halfprecision ).…”
Section: Input Featuresmentioning
confidence: 99%