2020
DOI: 10.3389/fbioe.2020.01032
|View full text |Cite
|
Sign up to set email alerts
|

Review on the Application of Machine Learning Algorithms in the Sequence Data Mining of DNA

Abstract: Deoxyribonucleic acid (DNA) is a biological macromolecule. Its main function is information storage. At present, the advancement of sequencing technology had caused DNA sequence data to grow at an explosive rate, which has also pushed the study of DNA sequences in the wave of big data. Moreover, machine learning is a powerful technique for analyzing largescale data and learns spontaneously to gain knowledge. It has been widely used in DNA sequence data analysis and obtained a lot of research achievements. Firs… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
52
0
1

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
3
1

Relationship

1
8

Authors

Journals

citations
Cited by 108 publications
(55 citation statements)
references
References 39 publications
2
52
0
1
Order By: Relevance
“…Classification is a very important method of data mining [ 27 ]. The concept of classification is to learn a classification function or construct a classification model on the basis of existing data, which is commonly referred to as a classifier.…”
Section: Basic Theoretical Knowledge Of Metalearningmentioning
confidence: 99%
“…Classification is a very important method of data mining [ 27 ]. The concept of classification is to learn a classification function or construct a classification model on the basis of existing data, which is commonly referred to as a classifier.…”
Section: Basic Theoretical Knowledge Of Metalearningmentioning
confidence: 99%
“…Identifying a GMO in a food processed from several species means that the analysts have to manage a larger matrix than previously performed. Design of experiments (DOE), machine learning, artificial neural networks, fuzzy logic, or genetic algorithms are some of the available tools to manage the big data that knowledge matrices could become (Alley et al, 2020;Nielsen and Voigt, 2018;Sivarajah et al, 2017;Yang et al, 2020;Yin et al, 2017). The management of millions of SNPs used in genomic selection show it is easily manageable.…”
Section: Signatures and Scars In Processed Productsmentioning
confidence: 99%
“…These big data could then be analysed by the species and mutagenesis category used to distinguish the similarities and differences, at least genetic, caused to species not mutated by man. Despite the numerous errors present in the sequence bases (Bertheau, 2019;Steinegger and Salzberg, 2020;Tang, 2020), the large number of sequences available, whether or not from GMOs, should, with a reasoned use of various statistical and DSS software and artificial intelligence, make it possible to distinguish scars and signatures (Alley et al, 2020;Block et al, 2013;Cadzow et al, 2014;Guillot et al, 2014;Interdonato et al, 2020;Koumakis, 2020;Nielsen and Voigt, 2018;Yang et al, 2020). Finally, some experiments should be enough to demonstrate the universality of the concept.…”
Section: Proof Of Conceptmentioning
confidence: 99%
“…Their results show that the SVM is superior to the Fisher linear discrimination classifier based on 10-fold cross-validation by 14.8%. Yang et al (2020) provided a review that introduced sequencing technology development and explains the structure of DNA sequence data and sequence similarity. Second, they analyzed the necessary DM process, summarized several of the significant ML algorithms, and highlighted the future challenges faced by ML algorithms in extracting biological sequence data and possible future solutions.…”
Section: Different Applicationsmentioning
confidence: 99%