2021
DOI: 10.1093/bib/bbab434
|View full text |Cite
|
Sign up to set email alerts
|

MathFeature: feature extraction package for DNA, RNA and protein sequences based on mathematical descriptors

Abstract: One of the main challenges in applying machine learning algorithms to biological sequence data is how to numerically represent a sequence in a numeric input vector. Feature extraction techniques capable of extracting numerical information from biological sequences have been reported in the literature. However, many of these techniques are not available in existing packages, such as mathematical descriptors. This paper presents a new package, MathFeature, which implements mathematical descriptors able to extrac… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
34
0
1

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
2

Relationship

1
7

Authors

Journals

citations
Cited by 50 publications
(35 citation statements)
references
References 88 publications
0
34
0
1
Order By: Relevance
“…This module, which is the first feature engineering stage, extracts feature descriptors using the MathFeature package [ 28 ], e.g. Mathematical descriptors (Fourier, Shannon, Tsallis, among others) and Conventional descriptors [Nucleic Acid Composition (NAC), dinucleotide composition (DNC), trinucleotide composition (TNC), ORF Features, Xmer k-Spaced Ymer composition frequency (kGap), Fickett score, among others].…”
Section: Bioautoml Packagementioning
confidence: 99%
See 1 more Smart Citation
“…This module, which is the first feature engineering stage, extracts feature descriptors using the MathFeature package [ 28 ], e.g. Mathematical descriptors (Fourier, Shannon, Tsallis, among others) and Conventional descriptors [Nucleic Acid Composition (NAC), dinucleotide composition (DNC), trinucleotide composition (TNC), ORF Features, Xmer k-Spaced Ymer composition frequency (kGap), Fickett score, among others].…”
Section: Bioautoml Packagementioning
confidence: 99%
“…These tasks can include different ways of preprocessing or feature engineering, as well as algorithms and optimization of its parameters (hyper-parameter tuning) [ 24–26 ]. In this study, BioAutoML calls the MathFeature package [ 27 , 28 ] to extract feature descriptors representing relevant numerical information from ncRNA sequences ( Feature Extraction module ). After receiving the feature values, BioAutoML, automatically recommends, using Bayesian Optimization [ 29 ], the best pair of selected features and predictive model.…”
Section: Introductionmentioning
confidence: 99%
“…Many feature engineering tools that target DNA, RNA, proteins and ligands were released in recent years. They include PseAAC ( 12 ), PROFEAT ( 13 ), PseAAC-Builder ( 14 ), PyDPI ( 15 ), ChemoPy ( 16 ), propy ( 17 ), RDKit ( 18 ), PseAAC-General ( 19 ), Rcpi ( 20 ), ProFET ( 21 ), protr/ProtrWeb ( 22 ), BioTriangle ( 23 ), repRNA ( 24 ), POSSUM ( 25 ), PseKRAAC ( 26 ), iFeature ( 27 ), PyFeat ( 28 ), Seq2Feature ( 29 ), MRMD2.0 ( 30 ) and MathFeature ( 31 ). Besides these feature engineering tools, several platforms for the development of machine learning predictors, including BioSeq-Analysis2.0 ( 32 ), PFeature ( 33 ), iLearn ( 34 ) and iLearnPlus ( 5 ), also provide feature extraction facilities.…”
Section: Introductionmentioning
confidence: 99%
“…Only BioTriangle covers DNA, RNA, ligands and protein sequences, however, it does not consider protein structures. MathFeature ( 31 ) and some of the recent machine learning platforms, such as BioSeq-Analysis2.0 ( 32 ), iLearn ( 34 ) and iLearnPlus ( 5 ), provide a relatively rich collection of feature sets for nucleic acids and proteins, outperforming the older feature engineering tools, however, they do not consider ligands and protein structures, except for PFeature ( 33 ) that considers only protein sequences and structures. A few current tools, including PIC ( 46 ), PDBparam ( 47 ) and PFeature, encode features from protein structure, facilitating important applications, such as rational drug development ( 48 ) and prediction of protein functions ( 49–52 ).…”
Section: Introductionmentioning
confidence: 99%
“…Nevertheless, ML algorithms applied to the analysis of biological sequences present challenges, such as feature extraction [ 10 ]. For non-structured data, as is the case of biological sequences, feature extraction is a key step for the success of ML applications [ 11 , 12 , 13 ].…”
Section: Introductionmentioning
confidence: 99%