KHAIRIL ANWAR NOTODIPUTRO scite author profile

KHAIRIL ANWAR NOTODIPUTRO

2Publications

3Citation Statements Received

11Citation Statements Given

How they've been cited

How they cite others

Affiliations

Publications

Order By: Most citations

Implementation of Winsorizing and random oversampling on data containing outliers and unbalanced data with the random forest classification method

Zubedi

SARTONO²,

NOTODIPUTRO³

2022

J. Nat.

View full text Add to dashboard Cite

Many researchers conduct research using the classification method, to find out the best method for predicting the class of an observation. Some of these studies explain that random forest is the best method. However, the classification of data containing outliers and unbalanced data is a complicated problem. Many researchers are also conducting research to deal with these problems. In this study, we propose a winsorizing to deal with outliers by replacing the outlier values with the upper and lower limit values obtained from the interquartile range method and random oversampling to balance the data. It is also known that cases of the Human Development Index (HDI) in regencies/cities in eastern Indonesia vary widely, so cases of HDI in these areas can be used as case studies of data containing outliers and unbalanced data. The purpose of this study was to compare the performance of the random forest before and after the data were applied to the winsorizing and random oversampling to predict HDI in districts/cities in eastern Indonesia. Classification method random forest after handling data containing outliers and unbalanced data has better performance in terms of accuracy and kappa values, which are 96.43% and 93.41%, respectively. The variables of expenditure per capita and the mean years of schooling are the most important.

show abstract

Performance of copula and nested error regression models in estimating per capita expenditure of sub-district in Pidie Regency

Hasanah¹,

NOTODIPUTRO²,

SARTONO³

2023

J. Nat.

View full text Add to dashboard Cite

In unit-level small area estimation (SAE), the commonly used nested error regression (NER) model assumes normality which is not always the case. To handle non-normal data, researchers in statistics have developed a novel approach using exchangeable and extendible copula called the multivariate exchangeable copula (MEC) model. This study compares the performance of parametric MEC and NER models in estimating the sub-district average of per capita expenditure (PCE) in Pidie Regency, Aceh Province. This study presents PCE, which has a skewed distribution of the three-parameter skew-normal. The parametric MEC model uses a Gaussian copula from the Elliptical family and an empirical best unbiased prediction (EBUP) estimator. Meanwhile, the NER model uses an empirical best linear unbiased prediction (EBLUP) estimator. The results reveal that at a 95% confidence level, the parametric MEC model outperforms the NER model with a smaller root of mean squared error (RMSE) and provides a more precise estimate of the sub-district average of PCE. This study highlights the importance of considering the parametric MEC model as an alternative method for skewed data in unit-level SAE. The results of this study have the potential to support the achievement of Goal 1 (to end poverty) and Goal 10 (to reduce inequality) of the sustainable development goals (SDGs) by providing average PCE estimates at the sub-district level.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.