2021
DOI: 10.3389/fpls.2021.506681
|View full text |Cite
|
Sign up to set email alerts
|

Identifying Plant Pentatricopeptide Repeat Proteins Using a Variable Selection Method

Abstract: Motivation: Pentatricopeptide repeat (PPR), which is a triangular pentapeptide repeat domain, plays an important role in plant growth. Features extracted from sequences are applicable to PPR protein identification using certain classification methods. However, which components of a multidimensional feature (namely variables) are more effective for protein discrimination has never been discussed. Therefore, we seek to select variables from a multidimensional feature for identifying PPR proteins.Method: A framew… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
14
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
9

Relationship

1
8

Authors

Journals

citations
Cited by 23 publications
(14 citation statements)
references
References 23 publications
0
14
0
Order By: Relevance
“…The noise in feature vector might result in the unsatisfactory performance of a model [56] , [57] , [58] , [59] , [60] , [61] , [62] , [63] . Therefore, the selection of features is an obligatory phase to remove the less important features and increase the productivity of a model [37] , [64] , [65] , [66] , [67] , [68] , [69] . Many feature selection and ranking techniques are available, such as ANOVA, F-score [70] , mRMR [27] , Chi-square [71] , LGBM [72] , [73] .…”
Section: Methodsmentioning
confidence: 99%
“…The noise in feature vector might result in the unsatisfactory performance of a model [56] , [57] , [58] , [59] , [60] , [61] , [62] , [63] . Therefore, the selection of features is an obligatory phase to remove the less important features and increase the productivity of a model [37] , [64] , [65] , [66] , [67] , [68] , [69] . Many feature selection and ranking techniques are available, such as ANOVA, F-score [70] , mRMR [27] , Chi-square [71] , LGBM [72] , [73] .…”
Section: Methodsmentioning
confidence: 99%
“…Feature redundancy or dimensionality disasters often occur during feature extraction. Feature selection not only reduces the risk of overfitting but also improves the model’s generalization ability and computational efficiency ( Guo et al, 2020 ; Yang et al, 2021a ; Ao et al, 2021b ; Zhao et al, 2021 ). In the present paper, we use the max relevance max distance (MRMD) feature selection method to reduce the dimensions of the initial feature set ( He et al, 2020 ).…”
Section: Methodsmentioning
confidence: 99%
“…The SE and SP metrics measure the predictive ability of the model for positive and negative samples, respectively. The other three metrics, ACC, Q, and MCC, reflect the overall performance and stability of the model [68,69]. Furthermore, receiver operating characteristic (ROC) curves are used to assess the real performance of the model more intuitively.…”
Section: Measurementmentioning
confidence: 99%