2021
DOI: 10.1101/2021.12.14.472723
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Nfeature: A platform for computing features of nucleotide sequences

Abstract: In the past few decades, public repositories on nucleotides have increased with exponential rates. This pose a major challenge to researchers to predict the structure and function of nucleotide sequences. In order to annotate function of nucleotide sequences it is important to compute features/attributes for predicting function of these sequences using machine learning techniques. In last two decades, several software/platforms have been developed to elicit a wide range of features for nucleotide sequences. In… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
9
0

Year Published

2022
2022
2025
2025

Publication Types

Select...
5
3

Relationship

6
2

Authors

Journals

citations
Cited by 11 publications
(9 citation statements)
references
References 35 publications
0
9
0
Order By: Relevance
“…In order to train our model, we are required to generate features or descriptors corresponding to each mRNA. For the aforementioned purpose, we have used the tool ‘Nfeature’ [20] [https://doi.org/10.1101/2021.12.14.472723], which can generate hundreds of features for a single mRNA sequence. These are the two feature classes which were used for training the models: Composition of DNA/RNA for k-mer (CDK): k-mers of length 3 were generated by Nfeature and the frequency of each k-mer was used as a feature for training the ML model.…”
Section: Methodsmentioning
confidence: 99%
“…In order to train our model, we are required to generate features or descriptors corresponding to each mRNA. For the aforementioned purpose, we have used the tool ‘Nfeature’ [20] [https://doi.org/10.1101/2021.12.14.472723], which can generate hundreds of features for a single mRNA sequence. These are the two feature classes which were used for training the models: Composition of DNA/RNA for k-mer (CDK): k-mers of length 3 were generated by Nfeature and the frequency of each k-mer was used as a feature for training the ML model.…”
Section: Methodsmentioning
confidence: 99%
“…We have generated a wide range of features like Position-Specific Tri-Nucleotide Propensity (PSTNPP), Electron-Ion Interaction Pseudopotentials of trinucleotide (EIIIP; He et al, 2018 ), dimer count, trimer count, motif counts, GC and AT skew ( Rahman et al, 2019a ), Dinucleotide Auto-Correlation (DAC), Dinucleotide Cross-Correlation (DCC), Dinucleotide Auto Cross-Correlation (DACC; Friedel et al, 2009 ), Moran Auto-Correlation (MAC), Normalized Moreau-Broto Auto-Correlation (NMBAC; Chen et al, 2015 ), and Parallel Correlation Pseudo Tri-Nucleotide Composition (PC_PTNC; Liu et al, 2014 ), which resulted in 8465 features. The aforementioned features were calculated using Nfeature webserver ( Mathur et al, 2021 ) available at https://webs.iiitd.edu.in/raghava/nfeature/ . Then, we have used the Min-Max scaler from the scikit-learn library ( Pedregosa et al, 2011 ) to scale down the values of the features, we have constructed.…”
Section: Methodsmentioning
confidence: 99%
“…A number of feature encoding techniques have been used in previous studies [27][28][29][30]. We used a standalone tool called Pfeature to compute numerous features for the proteins, including evolutionary information-based features and composition-based features [31].…”
Section: Feature Generationmentioning
confidence: 99%