2021
DOI: 10.1093/bib/bbab011
|View full text |Cite
|
Sign up to set email alerts
|

Feature extraction approaches for biological sequences: a comparative study of mathematical features

Abstract: As consequence of the various genomic sequencing projects, an increasing volume of biological sequence data is being produced. Although machine learning algorithms have been successfully applied to a large number of genomic sequence-related problems, the results are largely affected by the type and number of features extracted. This effect has motivated new algorithms and pipeline proposals, mainly involving feature extraction problems, in which extracting significant discriminatory information from a biologic… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
30
0
1

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 30 publications
(31 citation statements)
references
References 119 publications
0
30
0
1
Order By: Relevance
“…First, the preprocessing step for removing duplication of genes is followed by representing genes as DNA FASTA sequences and removing redundancies of sequences. Second, from the DNA sequences, the most significant features are extracted using different feature extraction 24 2020 Diagnosing between diferent cases lncRNAs DFT, Entropy, RefSeq, GreeNC Complex Network Ensembl (v87, v32) Bi et al 23 2021 Predicting PD-related genes brain regions CERNNE PPMI ACC = 88.57%…”
Section: Methodsmentioning
confidence: 99%
See 4 more Smart Citations
“…First, the preprocessing step for removing duplication of genes is followed by representing genes as DNA FASTA sequences and removing redundancies of sequences. Second, from the DNA sequences, the most significant features are extracted using different feature extraction 24 2020 Diagnosing between diferent cases lncRNAs DFT, Entropy, RefSeq, GreeNC Complex Network Ensembl (v87, v32) Bi et al 23 2021 Predicting PD-related genes brain regions CERNNE PPMI ACC = 88.57%…”
Section: Methodsmentioning
confidence: 99%
“…The proposed system represents all genes as DNA FASTA sequences to get all essential and distinguishing information. We extract the most significant features of these FASTA sequences using five numerical Fourier transform 19,24 and PyFeat method with AB as a feature selection technique 15 . The selected features are fed to the GBDT technique to aid in the diagnosis of different test cases.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations