On Using Linear Diophantine Equations for in-Parallel Hiding of Decision Tree Rules

Feretzakis, Georgios; Kalles, Dimitris; Verykios, Vassilios S.

doi:10.3390/e21010066

Cited by 10 publications

(4 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The first one is that we do not need to add new instances to the original data set, and the second is that our new heuristic can be performed in only one step with much lower computational complexity compared to solving systems of Linear Diophantine Equations. However, our previous published techniques [ 20 , 21 ] guarantee the preservation of entropy values in every node of the tree before and after the modification.…”

Section: Discussionmentioning

confidence: 99%

“…This approach is critical because the sanitized data set may be subsequently published and even shared with the data set owner’s competitors, as can be the case with retail banking [ 19 ]. We extend this work in the papers [ 20 , 21 ] by formulating a generic look ahead technique that considers the structure of the decision tree from an affected leaf to the root. The main contribution of these publications was to improve the Swap-and-Add pass by following a look ahead approach instead of the greedy approach which was previously used.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Using Minimum Local Distortion to Hide Decision Tree Rules

Feretzakis

Kalles

Verykios

2019

Entropy

Self Cite

View full text Add to dashboard Cite

The sharing of data among organizations has become an increasingly common procedure in several areas like banking, electronic commerce, advertising, marketing, health, and insurance sectors. However, any organization will most likely try to keep some patterns hidden once it shares its datasets with others. This article focuses on preserving the privacy of sensitive patterns when inducing decision trees. We propose a heuristic approach that can be used to hide a certain rule which can be inferred from the derivation of a binary decision tree. This hiding method is preferred over other heuristic solutions like output perturbation or cryptographic techniques—which limit the usability of the data—since the raw data itself is readily available for public use. This method can be used to hide decision tree rules with a minimum impact on all other rules derived.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Using Minimum Local Distortion to Hide Decision Tree Rules

Feretzakis

Kalles

Verykios

2019

Entropy

Self Cite

View full text Add to dashboard Cite

show abstract

“…In the learning phase, the rules are derived (tree generation) and in an accuracy verification phase, random data taken from the training set is tested and rules are adjusted in order to decrease the tree size (tree pruning); in the end the unlabeled data points are classified with the rules thus developed and tested [70,71]. Simplicity, transparency, easiness to understand and to implement [72,73] are key advantages of the decision tree classifier. The key parameter influencing the tree's performance is its maximum depth, as it decides its complexity [74]; in our models, this parameter had values between two and four.…”

Section: Classification Algorithmsmentioning

confidence: 99%

QSAR Models for Active Substances against Pseudomonas aeruginosa Using Disk-Diffusion Test Data

2021

View full text Add to dashboard Cite

Pseudomonas aeruginosa is a Gram-negative bacillus included among the six “ESKAPE” microbial species with an outstanding ability to “escape” currently used antibiotics and developing new antibiotics against it is of the highest priority. Whereas minimum inhibitory concentration (MIC) values against Pseudomonas aeruginosa have been used previously for QSAR model development, disk diffusion results (inhibition zones) have not been apparently used for this purpose in the literature and we decided to explore their use in this sense. We developed multiple QSAR methods using several machine learning algorithms (support vector classifier, K nearest neighbors, random forest classifier, decision tree classifier, AdaBoost classifier, logistic regression and naïve Bayes classifier). We used four sets of molecular descriptors and fingerprints and three different methods of data balancing, together with the “native” data set. In total, 32 models were built for each set of descriptors or fingerprint and balancing method, of which 28 were selected and stacked to create meta-models. In terms of balanced accuracy, the best performance was provided by KNN, logistic regression and decision tree classifier, but the ensemble method had slightly superior results in nested cross-validation.

show abstract

“…In articles [5][6][7][8], the authors proposed a series of strategies that would effectively protect against the disclosure of the sensitive classification rules. The LDH algorithm [9] was developed on the basis of the concept of preserving sensitive DT rules resulting from the use of data mining techniques.…”

Section: Introductionmentioning

confidence: 99%

Inference Control in a Diabetes Data Set Using a Java-Based Prototype of LDH Algorithm

Feretzakis

Mitropoulos

Kalles

et al. 2022

Studies in Health Technology and Informatics

Self Cite

View full text Add to dashboard Cite

Data sharing among different entities in the healthcare domain has become an increasingly common practice, where each entity would most likely want to prevent indirect data disclosure via inference channels. The Local Distortion Hiding (LDH) algorithm has been developed to protect sensitive decision tree (DT) rules, which are chosen not to be disclosed when DT construction techniques are applied to the data. This article presents eight experiments using a Java-based prototype that implements the LDH algorithm in a diabetes data set. Our experiments test the ability of the LDH algorithm in two ways, firstly in inference control and secondly in maintaining the structure and the performance metrics of the resulting DT. Our experiments on hiding eight terminal nodes in a diabetes data set using a Java-based prototype that implements the LDH algorithm, yield satisfactory results.

show abstract

On Using Linear Diophantine Equations for in-Parallel Hiding of Decision Tree Rules

Cited by 10 publications

References 21 publications

Using Minimum Local Distortion to Hide Decision Tree Rules

Using Minimum Local Distortion to Hide Decision Tree Rules

QSAR Models for Active Substances against Pseudomonas aeruginosa Using Disk-Diffusion Test Data

Inference Control in a Diabetes Data Set Using a Java-Based Prototype of LDH Algorithm

Contact Info

Product

Resources

About