Zacharie Naulet scite author profile

Protection against disclosure is a legal and ethical obligation for agencies releasing microdata files for public use. Consider a microdata sample of size n from a finite population of size n = n + λn, with λ > 0, such that each sample record contains two disjoint types of information: identifying categorical information and sensitive information. Any decision about releasing data is supported by the estimation of measures of disclosure risk, which are defined as discrete functionals of the number of sample records with a unique combination of values of identifying variables. The most common measure is arguably the number τ1 of sample unique records that are population uniques. In this paper, we first study nonparametric estimation of τ1 under the Poisson abundance model for sample records. We introduce a class of linear estimators of τ1 that are simple, computationally efficient and scalable to massive datasets, and we give uniform theoretical guarantees for them. In particular, we show that they provably estimate τ1 all of the way up to the sampling fraction (λ + 1) −1 ∝ (log n) −1 , with vanishing normalized mean-square error (NMSE) for large n. We then establish a lower bound for the minimax NMSE for the estimation of τ1, which allows us to show that: i) (λ+1) −1 ∝ (log n) −1 is the smallest possible sampling fraction for consistently estimating τ1; ii) estimators' NMSE is near optimal, in the sense of matching the minimax lower bound, for large n. This is the main result of our paper, and it provides a rigorous answer to an open question about the feasibility of nonparametric estimation of τ1 under the Poisson abundance model and for a sampling fraction (λ + 1) −1 < 1/2.

show abstract

Bootstrap estimators for the tail-index and for the count statistics of graphex processes

Naulet¹,

Roy²,

Sharma³

et al. 2021

Electron. J. Statist.

View full text Add to dashboard Cite

Graphex processes resolve some pathologies in traditional random graph models, notably, providing models that are both projective and allow sparsity. Most of the literature on graphex processes study them from a probabilistic point of view. Techniques for inferring the parameter of these processes -the so-called graphon -are still marginal; exceptions are a few papers considering parametric families of graphons. Nonparametric estimation remains unconsidered. In this paper, we propose estimators for a selected choice of functionals of the graphon. Our estimators originate from the subsampling theory for graphex processes, hence can be seen as a form of bootstrap procedure.

show abstract

Some aspects of symmetric Gamma process mixtures

Naulet¹,

Barat²

2015

Preprint

View full text Add to dashboard Cite

show abstract

Adaptive Bayesian density estimation in sup-norm

Naulet

2022

Bernoulli

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Zacharie Naulet

Optimal disclosure risk assessment

Bootstrap estimators for the tail-index and for the count statistics of graphex processes

Some aspects of symmetric Gamma process mixtures

Adaptive Bayesian density estimation in sup-norm

Contact Info

Product

Resources

About