UDSMProt: universal deep sequence models for protein classification

Strodthoff, Nils; Wagner, Patrick; Wenzel, Markus; Samek, Wojciech

doi:10.1093/bioinformatics/btaa003

Cited by 147 publications

(91 citation statements)

References 31 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This observation is interesting considering the fact that for alleles with fewer than 1000 training measurements, MHCFlurry was pretrained on an augmented training set with measurements from BLOSUM similar alleles, USMPep_LM_ens was pretrained on a large corpus of unlabeled peptides and USMPep_FS_ens in contrast only saw the training sequences corresponding to one MHC molecule. These results stress that further efforts might be required to truly leverage the potential of unlabeled peptide data in order to observe similar improvements as seen for proteins [15] in particular for small datasets.…”

Section: Kim14 Datasetmentioning

confidence: 59%

“…The approach builds on the UDSMProt-framework [15] and related work in natural language processing [16]. We distinguish two variants of our approach, either train the regression from scratch or employ language model pretraining.…”

Section: Usmpep: Universal Sequence Models For Peptide Binding Predicmentioning

confidence: 99%

“…1 for a schematic representation. The setup closely follows that used in [15], where protein properties were predicted. The smaller dataset sizes and shorter sequence lengths in the peptide setting (in comparison to protein classification) do not allow for building up large contexts and were accounted for by the reduction of the number of layers from 3 to 1, of the number of hidden units from 1150 to 64 and of the embedding size from 400 to 50.…”

Section: Usmpep: Universal Sequence Models For Peptide Binding Predicmentioning

confidence: 99%

“…Similar to [15], the training procedure included 1-cycle learning rate scheduling [18] and discriminative learning rates [16] during finetuning. Target variables for the regression model were log-transformed half-maximal inhibitory concentration (IC 50 )-values and a modified mean-squared error loss function [11] that allows to incorporate qualitative data.…”

Section: Usmpep: Universal Sequence Models For Peptide Binding Predicmentioning

confidence: 99%

See 3 more Smart Citations

USMPep: universal sequence models for major histocompatibility complex binding affinity prediction

et al. 2020

Self Cite

View full text Add to dashboard Cite

Background: Immunotherapy is a promising route towards personalized cancer treatment. A key algorithmic challenge in this process is to decide if a given peptide (neoepitope) binds with the major histocompatibility complex (MHC). This is an active area of research and there are many MHC binding prediction algorithms that can predict the MHC binding affinity for a given peptide to a high degree of accuracy. However, most of the state-of-the-art approaches make use of complicated training and model selection procedures, are restricted to peptides of a certain length and/or rely on heuristics. Results: We put forward USMPep, a simple recurrent neural network that reaches state-of-the-art approaches on MHC class I binding prediction with a single, generic architecture and even a single set of hyperparameters both on IEDB benchmark datasets and on the very recent HPV dataset. Moreover, the algorithm is competitive for a single model trained from scratch, while ensembling multiple regressors and language model pretraining can still slightly improve the performance. The direct application of the approach to MHC class II binding prediction shows a solid performance despite of limited training data. Conclusions: We demonstrate that competitive performance in MHC binding affinity prediction can be reached with a standard architecture and training procedure without relying on any heuristics.

show abstract

Section: Kim14 Datasetmentioning

confidence: 59%

Section: Usmpep: Universal Sequence Models For Peptide Binding Predicmentioning

confidence: 99%

Section: Usmpep: Universal Sequence Models For Peptide Binding Predicmentioning

confidence: 99%

Section: Usmpep: Universal Sequence Models For Peptide Binding Predicmentioning

confidence: 99%

See 2 more Smart Citations

USMPep: universal sequence models for major histocompatibility complex binding affinity prediction

et al. 2020

Self Cite

View full text Add to dashboard Cite

show abstract

“…[14] This type of encoding method has been demonstrated to be extremely useful in certain tasks. [12,[14][15][16] Before encoding a sequence as dense numeric vectors, the sequence is typically represented as an integer vector in which each token is represented by a unique integer. The final method is to design handcrafted features and then take these features as input for modeling.…”

Section: Basic Concepts In Deep Learningmentioning

confidence: 99%

Deep Learning in Proteomics

et al. 2020

View full text Add to dashboard Cite

Proteomics, the study of all the proteins in biological systems, is becoming a data-rich science. Protein sequences and structures are comprehensively catalogued in online databases. With recent advancements in tandem mass spectrometry (MS) technology, protein expression and post-translational modifications (PTMs) can be studied in a variety of biological systems at the global scale. Sophisticated computational algorithms are needed to translate the vast amount of data into novel biological insights. Deep learning automatically extracts data representations at high levels of abstraction from data, and it thrives in data-rich scientific research domains. Here, a comprehensive overview of deep learning applications in proteomics, including retention time prediction, MS/MS spectrum prediction, de novo peptide sequencing, PTM prediction, major histocompatibility complex-peptide binding prediction, and protein structure prediction, is provided. Limitations and the future directions of deep learning in proteomics are also discussed. This review will provide readers an overview of deep learning and how it can be used to analyze proteomics data.

show abstract

Deep learning for drug repurposing: Methods, databases, and applications

Pan

Lin

Cao

et al. 2022

WIREs Comput Mol Sci

View full text Add to dashboard Cite

Drug development is time-consuming and expensive. Repurposing existing drugs for new therapies is an attractive solution that accelerates drug development at reduced experimental costs, specifically for Coronavirus Disease 2019 (COVID-19), an infectious disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). However, comprehensively obtaining and productively integrating available knowledge and big biomedical data to effectively advance deep learning models is still challenging for drug repurposing in other complex diseases. In this review, we introduce guidelines on how to utilize deep learning methodologies and tools for drug repurposing. We first summarized the commonly used bioinformatics and pharmacogenomics databases for drug repurposing. Next, we discuss recently developed sequence-based and graph-based representation approaches as well as state-of-the-art deep learning-based methods. Finally, we present applications of drug repurposing to fight the COVID-19 pandemic, and outline its future challenges.

show abstract

UDSMProt: universal deep sequence models for protein classification

Cited by 147 publications

References 31 publications

USMPep: universal sequence models for major histocompatibility complex binding affinity prediction

USMPep: universal sequence models for major histocompatibility complex binding affinity prediction

Deep Learning in Proteomics

Deep learning for drug repurposing: Methods, databases, and applications

Contact Info

Product

Resources

About