Amino acid encoding for deep learning applications

ElAbd, Hesham; Bromberg, Yana; Hoarfrost, Adrienne; Lenz, Tobias; Franke, André; Wendorff, Mareike

doi:10.1186/s12859-020-03546-x

Cited by 85 publications

(68 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The word embedding dimension D can be lower than the alphabet size, and thus, lower than the one-hot encoding dimension. For DeepNOG, D = 10 gave best results on validation data, which reflects the findings of a recent survey on amino acid encoding schemes ( ElAbd et al , 2020 ). Consequently, each amino acid is represented as a ten-dimensional vector.…”

Section: Methodssupporting

confidence: 83%

DeepNOG: fast and accurate protein orthologous group assignment

et al. 2020

View full text Add to dashboard Cite

Motivation Protein orthologous group databases are powerful tools for evolutionary analysis, functional annotation or metabolic pathway modeling across lineages. Sequences are typically assigned to orthologous groups with alignment-based methods, such as profile hidden Markov models, which have become a computational bottleneck. Results We present DeepNOG, an extremely fast and accurate, alignment-free orthology assignment method based on deep convolutional networks. We compare DeepNOG against state-of-the-art alignment-based (HMMER, DIAMOND) and alignment-free methods (DeepFam) on two orthology databases (COG, eggNOG 5). DeepNOG can be scaled to large orthology databases like eggNOG, for which it outperforms DeepFam in terms of precision and recall by large margins. While alignment-based methods still provide the most accurate assignments among the investigated methods, computing time of DeepNOG is an order of magnitude lower on CPUs. Optional GPU usage further increases throughput massively. A command-line tool enables rapid adoption by users. Availabilityand implementation Source code and packages are freely available at https://github.com/univieCUBE/deepnog. Install the platform-independent Python program with $pip install deepnog. Supplementary information Supplementary data are available at Bioinformatics online.

show abstract

Section: Methodssupporting

confidence: 83%

DeepNOG: fast and accurate protein orthologous group assignment

et al. 2020

View full text Add to dashboard Cite

show abstract

“…3 ). Among them, one-hot encoding is to transform a character into a binary-bit vector [42] , [43] . One-hot encoding scheme is popular since deep learning models require grid-like input with numbers.…”

Section: Data Formats and Encoding Schemesmentioning

confidence: 99%

A review on compound-protein interaction prediction methods: Data, format, representation and model

Lim

Cho

et al. 2021

Computational and Structural Biotechnology Journal

View full text Add to dashboard Cite

There has recently been a rapid progress in computational methods for determining protein targets of small molecule drugs, which will be termed as compound protein interaction (CPI). In this review, we comprehensively review topics related to computational prediction of CPI. Data for CPI has been accumulated and curated significantly both in quantity and quality. Computational methods have become powerful ever to analyze such complex the data. Thus, recent successes in the improved quality of CPI prediction are due to use of both sophisticated computational techniques and higher quality information in the databases. The goal of this article is to provide reviews of topics related to CPI, such as data, format, representation, to computational models, so that researchers can take full advantages of these resources to develop novel prediction methods. Chemical compounds and protein data from various resources were discussed in terms of data formats and encoding schemes. For the CPI methods, we grouped prediction methods into five categories from traditional machine learning techniques to state-of-the-art deep learning techniques. In closing, we discussed emerging machine learning topics to help both experimental and computational scientists leverage the current knowledge and strategies to develop more powerful and accurate CPI prediction methods.

show abstract

“…[13] These vectors could also be learned jointly with the main task (e.g., RT prediction or MHC-peptide binding prediction) in the same way that the weights of the neural network of the main task are learned. [14] This type of encoding method has been demonstrated to be extremely useful in certain tasks. [12,[14][15][16] Before encoding a sequence as dense numeric vectors, the sequence is typically represented as an integer vector in which each token is represented by a unique integer.…”

Section: Basic Concepts In Deep Learningmentioning

confidence: 99%

“…[14] This type of encoding method has been demonstrated to be extremely useful in certain tasks. [12,[14][15][16] Before encoding a sequence as dense numeric vectors, the sequence is typically represented as an integer vector in which each token is represented by a unique integer. The final method is to design handcrafted features and then take these features as input for modeling.…”

Section: Basic Concepts In Deep Learningmentioning

confidence: 99%

Deep Learning in Proteomics

et al. 2020

View full text Add to dashboard Cite

Proteomics, the study of all the proteins in biological systems, is becoming a data-rich science. Protein sequences and structures are comprehensively catalogued in online databases. With recent advancements in tandem mass spectrometry (MS) technology, protein expression and post-translational modifications (PTMs) can be studied in a variety of biological systems at the global scale. Sophisticated computational algorithms are needed to translate the vast amount of data into novel biological insights. Deep learning automatically extracts data representations at high levels of abstraction from data, and it thrives in data-rich scientific research domains. Here, a comprehensive overview of deep learning applications in proteomics, including retention time prediction, MS/MS spectrum prediction, de novo peptide sequencing, PTM prediction, major histocompatibility complex-peptide binding prediction, and protein structure prediction, is provided. Limitations and the future directions of deep learning in proteomics are also discussed. This review will provide readers an overview of deep learning and how it can be used to analyze proteomics data.

show abstract

Amino acid encoding for deep learning applications

Cited by 85 publications

References 21 publications

DeepNOG: fast and accurate protein orthologous group assignment

DeepNOG: fast and accurate protein orthologous group assignment

A review on compound-protein interaction prediction methods: Data, format, representation and model

Deep Learning in Proteomics

Contact Info

Product

Resources

About