2022
DOI: 10.1007/s11135-022-01480-z
|View full text |Cite
|
Sign up to set email alerts
|

Identification of the most important external features of highly cited scholarly papers through 3 (i.e., Ridge, Lasso, and Boruta) feature selection data mining methods

Abstract: Highly cited papers are influenced by external factors that are not directly related to the document's intrinsic quality. In this study, 50 characteristics for measuring the performance of 68 highly cited papers, from the Journal of The American Medical Informatics Association indexed in Web of Science (WOS), from 2009 to 2019 were investigated. In the first step, a Pearson correlation analysis is performed to eliminate variables with zero or weak correlation with the target (“dependent”) variable (number of c… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 11 publications
(4 citation statements)
references
References 98 publications
0
4
0
Order By: Relevance
“…We adopted a hybrid method for the MGMT-based feature selection process, combining ranking-based feature weighting and filter and embedding-based feature selection methods [2,26] in this study. The mRMR and LASSO methods are among the most commonly used and important algorithms for the feature selection process in the most highly cited papers in the literature [58,59]. In this study, we utilized the advantages (i.e., power) of both different types of feature selection methods (i.e., filter and embedded) by combining two different algorithms via a rank-based weighting methodology.…”
Section: Proposed Schemementioning
confidence: 99%
“…We adopted a hybrid method for the MGMT-based feature selection process, combining ranking-based feature weighting and filter and embedding-based feature selection methods [2,26] in this study. The mRMR and LASSO methods are among the most commonly used and important algorithms for the feature selection process in the most highly cited papers in the literature [58,59]. In this study, we utilized the advantages (i.e., power) of both different types of feature selection methods (i.e., filter and embedded) by combining two different algorithms via a rank-based weighting methodology.…”
Section: Proposed Schemementioning
confidence: 99%
“…Compared to the used approach, the PCA approach does not provide a clear list of variables to be removed (or simply not collected), but the transformation needs to be performed every time new data is collected, before using the developed regression model, possibly adding time to the very fast prediction time of the developed MLP [28]. A more direct approach would be the application of Lasso or Ridge regression to determine the coefficients of the variables and lower them to zero to eliminate the low-influenced ones from the dataset [29]. Still, RF does have some benefits in comparison to these methods such as robustness to non-linearity and multicollinearity, lower sensitivity to outliers and unscaled features, and automatic variable interaction capturing [30,31].…”
Section: Feature Importancementioning
confidence: 99%
“…1) Linear model [8]: In the models used in this article, Lasso, Ridge, and Enet are all linear regression models, with the main difference being the regularization term of the loss function. Lasso uses the L1 normal form of coefficient w, Ridge uses the L2 normal form of coefficient w, and ENet blends the first two together.…”
Section: Modelingmentioning
confidence: 99%