2021
DOI: 10.1016/j.eswa.2021.115597
|View full text |Cite
|
Sign up to set email alerts
|

Evidential reasoning for preprocessing uncertain categorical data for trustworthy decisions: An application on healthcare and finance

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
9
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3
2
1

Relationship

0
10

Authors

Journals

citations
Cited by 50 publications
(9 citation statements)
references
References 61 publications
0
9
0
Order By: Relevance
“…2) The presence of categorical (qualitative) attributes increases the information uncertainty, so we apply automatic techniques for preprocessing (i.e., transform categorical attributes into numbers) the input data such as one-hot, target and label encoding [38]. In particular, one-hot and label methods were discarded: the former due to its inefficiency when categorical attributes have high cardinality (a large number of categories), and the latter because it induces misunderstanding in the learning process since the value automatically assigned lacks of semantics.…”
Section: Main Contributionmentioning
confidence: 99%
“…2) The presence of categorical (qualitative) attributes increases the information uncertainty, so we apply automatic techniques for preprocessing (i.e., transform categorical attributes into numbers) the input data such as one-hot, target and label encoding [38]. In particular, one-hot and label methods were discarded: the former due to its inefficiency when categorical attributes have high cardinality (a large number of categories), and the latter because it induces misunderstanding in the learning process since the value automatically assigned lacks of semantics.…”
Section: Main Contributionmentioning
confidence: 99%
“…We applied some pre-processing steps as the collected dataset were having missing values and class imbalance problems. Referring the CVD dataset, the dataset contained a total of 43400 patient records out of which 14754 values Attribute Description i.d patient's i.d gender includes ("male": 0, "female": 1, "other": 2) age patient's age (continuous) hypertension suffering from hypertension ("yes":1, "no":0) heart _disease suffering heart disease ("yes":1, "no":0) ever_married marital status of patient ("yes":1, "no":0) work_type job status ("children":0, "govt_job":1, "never_worked":2, "private":3, "self_employed":4) residence_type ("rural:0, "urban":1) avg_glucose_level average glucose level of blood (continuous) bmi body mass index (decimal value) smoking_status ("never smoked":0, "formerly smoked":1, "smokes":2) stroke ("yes":1, "no":0) in handling the missing data, however their usage in medical field is limited and specific efficacy for disease detection is not clear [27]. Most of the times, researchers do not consider the observations with missing values and drop the incomplete cases intentionally, since the traditional data imputation methods are not sufficient to capture the missing data complexities in health care applications [28,29].…”
Section: Pre-processingmentioning
confidence: 99%
“…Preprocessing is essential to improve data quality so that machine learning can function properly [19]. An unprocessed dataset is usually ambiguous and incomplete because some of its attributes are missing, either in its inputs or outputs, which may negatively affect the machine learning modeling [20]. Moreover, Qlattice models immediately detect data types; incorrect detection of data types leads to poor machine learning models.…”
Section: Preprocessing Datasetmentioning
confidence: 99%