2012
DOI: 10.1287/isre.1110.0361
|View full text |Cite
|
Sign up to set email alerts
|

Research Note—Generating Shareable Statistical Databases for Business Value: Multiple Imputation with Multimodal Perturbation

Abstract: B usiness organizations are generating growing volumes of data about their employees, customers, and suppliers. Much of these data cannot be exploited for business value due to privacy and confidentiality concerns. National statistical agencies share sensitive data collected from individuals and businesses by modifying the data so individuals and firms cannot be identified but statistical utility is preserved. We build on this literature to develop a hybrid approach to data masking for business organizations. … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0

Year Published

2015
2015
2021
2021

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 13 publications
(8 citation statements)
references
References 37 publications
0
6
0
Order By: Relevance
“…Bootstrapping has been previously used in statistical disclosure limitation. Fienberg (), Fienberg, Makov, and Steele (), Raghunathan, Reiter, and Rubin (), and Melville and McQuaid (), discussed the use of bootstrap‐style resampling for selecting microdata records to be released to the public. Ichim () proposed the use of a quantile‐based bootstrap for generating synthetic data.…”
Section: Existing Methods For Disclosure Prevention In Rasmentioning
confidence: 99%
“…Bootstrapping has been previously used in statistical disclosure limitation. Fienberg (), Fienberg, Makov, and Steele (), Raghunathan, Reiter, and Rubin (), and Melville and McQuaid (), discussed the use of bootstrap‐style resampling for selecting microdata records to be released to the public. Ichim () proposed the use of a quantile‐based bootstrap for generating synthetic data.…”
Section: Existing Methods For Disclosure Prevention In Rasmentioning
confidence: 99%
“…Generalization, suppression, and swapping apply to both categorical and numeric data. Noise perturbation adds noise to the original data to disguise their true values (Agrawal and Srikant 2000, Li and Sarkar 2013, Melville and McQuaid 2012), which applies mainly to numeric data. There are also studies that address privacy disclosure problems by hiding sensitive information (Menon et al 2005).…”
Section: Related Workmentioning
confidence: 99%
“…Applications and analyses based on structured data typically include traditional statistical analysis such as regression and multivariate analysis and data mining applications such as classification and clustering. Anonymization techniques are developed for these applications accordingly (Agrawal and Srikant 2000, Aggarwal and Yu 2008, Fung et al 2010, Li and Sarkar 2011, Melville and McQuaid 2012). For unstructured data such as medical text data, applications often include text mining tasks such as medical keyword-based search query and information extraction, as well as those that are also suitable for structured or semistructured data such as counting query and association analysis (Jensen et al 2012, Meystre et al 2008, Murphy et al 2010).…”
Section: Introductionmentioning
confidence: 99%
“…Hence, AI is commonly considered an interdisciplinary research area that attracts considerable attention both in economics and social domains as it offers a myriad of technological breakthroughs with regard to systems security [2]. There is a universal trend of investing in AI technology to face security challenges of our daily lives, such as statistical data, medicine, and transportation [3]. Some claim that specific data from key sectors have supported the development of AI, namely the availability of data from e-commerce [4], businesses [5], and government [6], which provided substantial input to ameliorate diverse machine-learning solutions and algorithms, in particular with respect to systems security [7].…”
Section: Introductionmentioning
confidence: 99%