Local differential privacy is a strict definition used to protect respondents in a distributed statistical survey against an untrusted survey organizer. Respondents randomize their records to make distinguishing any pair of records not easy in the system model, and LDP represents the easiness with a non-negative scalar . However, satisfying the requirements for larger record domains with a small can be challenging. One solution is top coding, which censors records that exceed a given threshold. Top coding reduces the range of the data and makes it easier to satisfy the requirements; however, top coding can also undermine the records' utility because it also reduces the information contained in the records. In this study, we attempted to determine when top coding effectively balances the privacy and the performance of estimators in LDP statistical estimations by analyzing the minimax risk, which measures the difficulty of an estimation problem. We classified estimation problems into three classes, characterizing them with the structure of the statistics and prior knowledge about the candidate populations, and derived the upper and lower bounds of minimax risks for them. The bounds suggest that (i) in the first class, the minimax risk can be arbitrarily large; (ii) in the second class, the minimax risk can always be small; and (iii) in the third class, only a moderate-size threshold gives a small minimax risk. Our findings suggest that practitioners should choose the statistic and threshold carefully and collect prior knowledge about the population to estimate the statistic with a small estimation error.
KeywordsTop coding • Minimax risk • Local differential privacy • Data clearing • Inequality B Hajime Ono