2018
DOI: 10.2991/ijcis.11.1.97
|View full text |Cite
|
Sign up to set email alerts
|

Dealing with Missing Data using a Selection Algorithm on Rough Sets

Abstract: This paper discusses the so-called missing data problem, i.e. the problem of imputing missing values in information systems. A new algorithm, called the ARSI algorithm, is proposed to address the imputation problem of missing values on categorical databases using the framework of rough set theory. This algorithm can be seen as a refinement of the ROUSTIDA algorithm and combines the approach of a generalized non-symmetric similarity relation with a generalized discernibility matrix to predict the missing values… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(1 citation statement)
references
References 21 publications
0
1
0
Order By: Relevance
“…This method can directly process multiple missing values, however, it performs poorly when running on large data sets [4]. Rough set theory is an effective way to deal with the problem of uncertainty, on the other hand, it cannot complete parameter optimization and missing data classification, which results in low precision of data completion [5]. Random forest (RF) can process high-dimensional data with high accuracy, while it has a large computational cost when processing a large amount of data [6].…”
Section: Introductionmentioning
confidence: 99%
“…This method can directly process multiple missing values, however, it performs poorly when running on large data sets [4]. Rough set theory is an effective way to deal with the problem of uncertainty, on the other hand, it cannot complete parameter optimization and missing data classification, which results in low precision of data completion [5]. Random forest (RF) can process high-dimensional data with high accuracy, while it has a large computational cost when processing a large amount of data [6].…”
Section: Introductionmentioning
confidence: 99%