2022
DOI: 10.1371/journal.pone.0267106
|View full text |Cite
|
Sign up to set email alerts
|

WalkIm: Compact image-based encoding for high-performance classification of biological sequences using simple tuning-free CNNs

Abstract: The classification of biological sequences is an open issue for a variety of data sets, such as viral and metagenomics sequences. Therefore, many studies utilize neural network tools, as the well-known methods in this field, and focus on designing customized network structures. However, a few works focus on more effective factors, such as input encoding method or implementation technology, to address accuracy and efficiency issues in this area. Therefore, in this work, we propose an image-based encoding method… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(4 citation statements)
references
References 49 publications
0
4
0
Order By: Relevance
“…These datasets are also chosen from various taxonomy levels and identities to evaluate the capability of PC-mer method against the state-of-art encoding methods, i.e. FCGR [ 11 , 12 , 20 22 ] and WalkIm [ 23 ]. These datasets, alongside HLT_datasets, have also been used for the second assessment.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…These datasets are also chosen from various taxonomy levels and identities to evaluate the capability of PC-mer method against the state-of-art encoding methods, i.e. FCGR [ 11 , 12 , 20 22 ] and WalkIm [ 23 ]. These datasets, alongside HLT_datasets, have also been used for the second assessment.…”
Section: Resultsmentioning
confidence: 99%
“…First of all, we examine the accuracy of various architectures of machine learning-based classifiers employing the PC-mer feature extraction approach to build their input vectors for classifying various levels of metagenomics data. We then compared our best results to the best classification performances provided by the RDP classifier [ 7 ], the reference classifier for bacteria identification, WalkIm [ 23 ], and finally, CNN-DBN classifier [ 11 ]. It should be noted that all simulation results for various classifier methods have been reported in S1 File in details.…”
Section: Resultsmentioning
confidence: 99%
“…To emphasize the importance of optical implementation of the forward inference in CNNs, optical processing of the large biological data sequences is explored as follows. As discussed in [45], classification of virus sequences (e.g., Coronaviruses, Dengue, HIV, Hepatitis B and C, and Influenza A), metagenomics data, and metabarcoding data can be performed by CNNs taking advantages of an appropriate image-based encoding method. It should be noted that single training procedure is carried out for each biological dataset while many test procedures are required to classify the input [44].…”
Section: Speed Comparisonmentioning
confidence: 99%
“…The problem of how to transform 2D data into 1D data or whether to do so at all is of interest in many different areas, such as in the prediction of molecular properties on the bases of molecular structures [5] or in the case of automated methods for detecting viral subtypes using genomic data [6]. The choice often depends on the selected transformers and classifiers.…”
Section: Introductionmentioning
confidence: 99%