2020
DOI: 10.3390/pr8060638
|View full text |Cite
|
Sign up to set email alerts
|

Measuring Performance Metrics of Machine Learning Algorithms for Detecting and Classifying Transposable Elements

Abstract: Because of the promising results obtained by machine learning (ML) approaches in several fields, every day is more common, the utilization of ML to solve problems in bioinformatics. In genomics, a current issue is to detect and classify transposable elements (TEs) because of the tedious tasks involved in bioinformatics methods. Thus, ML was recently evaluated for TE datasets, demonstrating better results than bioinformatics applications. A crucial step for ML approaches is the selection of metrics that measure… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
28
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
1

Relationship

1
5

Authors

Journals

citations
Cited by 41 publications
(28 citation statements)
references
References 75 publications
0
28
0
Order By: Relevance
“…In the current (post) genomic era [ 86 , 87 ], there is a need for automating TE annotation [ 39 , 44 ] to quickly analyze the huge amount of genomic data. Machine learning algorithms have become popular in bioinformatics because they provide promising results in complex tasks and given the availability of large databases.…”
Section: Discussionmentioning
confidence: 99%
See 4 more Smart Citations
“…In the current (post) genomic era [ 86 , 87 ], there is a need for automating TE annotation [ 39 , 44 ] to quickly analyze the huge amount of genomic data. Machine learning algorithms have become popular in bioinformatics because they provide promising results in complex tasks and given the availability of large databases.…”
Section: Discussionmentioning
confidence: 99%
“…We used ML algorithms such as logistic regression (LR), linear discriminant analysis (LDA), K-nearest neighbors (KNN), multi-layer perceptron with one layer (MLP), random forest (RF), decision trees (DT), naïve Bayes network (NB), and support vector machine (SVM) to test the performance of the datasets. We used the F1-score as the performance metric, which is the harmonic mean of precision and sensitivity [ 39 ] and we used it as the accuracy indicator; we used k-mer frequencies with 1 ≤ k ≤ 6 as features, and we used scaling and dimensional reduction using principal component analysis (PCA) as pre-processing steps, according to [ 39 ].…”
Section: Methodsmentioning
confidence: 99%
See 3 more Smart Citations