2015 OnEstimatingtheSwappingRateforC

From GM-RKB
Jump to navigation Jump to search

Subject Headings:

Notes

Cited By

Quotes

Author Keywords

Abstract

When analyzing data, it is important to account for all sources of noise. Public use datasets, such as those provided by the Census Bureau, often undergo additional perturbations designed to protect confidentiality. This source of noise is generally ignored in data analysis because crucial parameters and details about its implementation are withheld. In this paper, we consider the problem of inferring such parameters from the data. Specifically, we target data swapping, a perturbation technique commonly used by the U.S. Census Bureau and which, barring practical breakthroughs in disclosure control, will be used in the foreseeable future. The vanilla version of data swapping selects pairs of records and exchanges some of their attribute values. The number of swapped records is kept secret even though it is needed for data analysis and investigations into the confidentiality protection of individual records. We propose algorithms for estimating the number of swapped records in categorical data, even when the true data distribution is unknown.

References

;

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2015 OnEstimatingtheSwappingRateforCDaniel KiferOn Estimating the Swapping Rate for Categorical Data10.1145/2783258.27833692015