2022
DOI: 10.1016/j.neucom.2021.07.102
|View full text |Cite
|
Sign up to set email alerts
|

ProPythia: A Python package for protein classification based on machine and deep learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
9
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 16 publications
(11 citation statements)
references
References 45 publications
2
9
0
Order By: Relevance
“…Gradient Boosting (GBC) and K-nearest neighbor (KNN) models also performed well. Our study further supported the evidence that tree-based models outperformed other algorithms in classi cation tasks using tabular data such as modlAMP descriptors 49,[73][74][75] or other features 76 . Aware of the possible over-representation of α-helices among AMPs 29 , we estimated the structural landscapes of our model datasets, revealing a majority of peptides assuming α-helical and loose structures (subset I − 53.9%), two minorly represented coiled structures (III − 33.1%) and mixed structures (IV -13.0%), and the complete absence of β-stranded peptides (II).…”
Section: Discussionsupporting
confidence: 82%
“…Gradient Boosting (GBC) and K-nearest neighbor (KNN) models also performed well. Our study further supported the evidence that tree-based models outperformed other algorithms in classi cation tasks using tabular data such as modlAMP descriptors 49,[73][74][75] or other features 76 . Aware of the possible over-representation of α-helices among AMPs 29 , we estimated the structural landscapes of our model datasets, revealing a majority of peptides assuming α-helical and loose structures (subset I − 53.9%), two minorly represented coiled structures (III − 33.1%) and mixed structures (IV -13.0%), and the complete absence of β-stranded peptides (II).…”
Section: Discussionsupporting
confidence: 82%
“…These observations support the recent statement that tree-based models RFC and GBC performed very well in classifying tabular data 72 . Previous studies have previously demonstrated that tree-based models outperformed other algorithms in classification or regression tasks using modlAMP descriptors 49 , 73 75 or other features 76 .…”
Section: Resultsmentioning
confidence: 99%
“…These are sequence-based features, which are elemental to any protein and are widely employed in different protein identification and classification tasks. Additionally, other descriptor sets like PAAC (Pseudo amino acid composition), CKSAAP (Amino acid composition), and Moran (Autocorrelation) were also identified among the important features, which are also known to play a significant role in the classification of proteins, including NR. PAAC, which represents the positional and compositional pattern of an amino acid in protein sequences, preserves the evolutionary patterns within the protein families. Therefore, it is widely used in problems like predicting various post-translational modification sites or identifying protein subcellular localization where evolutional relationships of residues are important. …”
Section: Resultsmentioning
confidence: 99%