Evaluation of convolutionary neural networks modeling of DNA sequences using ordinal versus one-hot encoding method

Choong, Allen Chieng Hoon; Lee, Nung Kion

doi:10.1109/iconda.2017.8270400

Cited by 39 publications

(18 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…When processing the DNA sequence, it is necessary to convert the string sequence into a numerical value, so as to form a matrix input model training. Generally speaking, there are three methods for sequence encoding: sequential encoding, one-hot encoding, and k-mer encoding ( Choong and Lee, 2017 ). The characteristics of the three DNA encoding methods are shown in Table 1 .…”

Section: Basic Knowledge Of Dnamentioning

confidence: 99%

Review on the Application of Machine Learning Algorithms in the Sequence Data Mining of DNA

Yang

Zhang

Wang

et al. 2020

Front. Bioeng. Biotechnol.

108

View full text Add to dashboard Cite

Deoxyribonucleic acid (DNA) is a biological macromolecule. Its main function is information storage. At present, the advancement of sequencing technology had caused DNA sequence data to grow at an explosive rate, which has also pushed the study of DNA sequences in the wave of big data. Moreover, machine learning is a powerful technique for analyzing largescale data and learns spontaneously to gain knowledge. It has been widely used in DNA sequence data analysis and obtained a lot of research achievements. Firstly, the review introduces the development process of sequencing technology, expounds on the concept of DNA sequence data structure and sequence similarity. Then we analyze the basic process of data mining, summary several major machine learning algorithms, and put forward the challenges faced by machine learning algorithms in the mining of biological sequence data and possible solutions in the future. Then we review four typical applications of machine learning in DNA sequence data: DNA sequence alignment, DNA sequence classification, DNA sequence clustering, and DNA pattern mining. We analyze their corresponding biological application background and significance, and systematically summarized the development and potential problems in the field of DNA sequence data mining in recent years. Finally, we summarize the content of the review and look into the future of some research directions for the next step.

show abstract

Section: Basic Knowledge Of Dnamentioning

confidence: 99%

Review on the Application of Machine Learning Algorithms in the Sequence Data Mining of DNA

Yang

Zhang

Wang

et al. 2020

Front. Bioeng. Biotechnol.

108

View full text Add to dashboard Cite

show abstract

“…As the input_shape format of Conv3d required, the data dimensions were adjusted to suitable inputs using the transpose method. Finally, the input data is X, the label is Y, and the label Y is processed by one-hot encoding [25], which makes the feature calculation among features more reasonable and improves the computing speed. e calculation method is shown in Figure 4.…”

Section: Proposed Methodsmentioning

confidence: 99%

Lite-3DCNN Combined with Attention Mechanism for Complex Human Movement Recognition

Zhu

Bin

Sun

2022

Computational Intelligence and Neuroscience

View full text Add to dashboard Cite

Three-dimensional convolutional network (3DCNN) is an essential field of motion recognition research. The research work of this paper optimizes the traditional three-dimensional convolution network, introduces the self-attention mechanism, and proposes a new network model to analyze and process complex human motion videos. In this study, the average frame skipping sampling and scaling and the one-hot encoding are used for data pre-processing to retain more features in the limited data. The experimental results show that this paper innovatively designs a lightweight three-dimensional convolutional network combined with an attention mechanism framework, and the number of parameters of the model is reduced by more than 90% to only about 1.7 million. This study compared the performance of different models in different classifications and found that the model proposed in this study performed well in complex human motion video classification. Its recognition rate increased by 1%–8% compared with the C3D model.

show abstract

“…Filters slide over rows of the matrix (words), performing convolutions on the one-hot vector and generating feature maps. Since all neurons in the feature map scan the same feature of the previous layer but from different locations, different feature maps detect different types of features (Choong and Lee, 2017).…”

Section: Convolutional Layermentioning

confidence: 99%

Machine Learning and Deep Learning for Phishing Email Classification using One-Hot Encoding

Nandi¹,

Bagui²,

White³

2021

Journal of Computer Science

View full text Add to dashboard Cite

Representation of text is a significant task in Natural Language Processing (NLP) and in recent years Deep Learning (DL) and Machine Learning (ML) have been widely used in various NLP tasks like topic classification, sentiment analysis and language translation. Until very recently, little work has been devoted to semantic analysis in phishing detection or phishing email detection. The novelty of this study is in using deep semantic analysis to capture inherent characteristics of the text body. One-hot encoding was used with DL and ML techniques to classify emails as phishing or nonphishing. A comparison of various parameters and hyperparameters was performed for DL. The results of various ML models, Naïve Bayes, SVM, Decision Tree, as well as DL models, Convolutional Neural Networks (CNN) and Long Short Term Memory (LSTM), were presented. The DL models performed better than the ML models in terms of accuracy, but the ML models performed better than the DL models in terms of computation time. CNN with Word Embedding performed the best in terms of accuracy (96.34%), demonstrating the effectiveness of semantic analysis in phishing email detection.

show abstract

Evaluation of convolutionary neural networks modeling of DNA sequences using ordinal versus one-hot encoding method

Cited by 39 publications

References 16 publications

Review on the Application of Machine Learning Algorithms in the Sequence Data Mining of DNA

Review on the Application of Machine Learning Algorithms in the Sequence Data Mining of DNA

Lite-3DCNN Combined with Attention Mechanism for Complex Human Movement Recognition

Machine Learning and Deep Learning for Phishing Email Classification using One-Hot Encoding

Contact Info

Product

Resources

About