Date/Time: Wednesday, September 17, 2003, 11:00 am - 12:00 pm
Location: U.S. Department of Transportation, Nassif Building, 400 7th St., SW, Room 8240
Title: Preserving Quality and Confidentiality of Tabular Data
Presenter: Lawrence H. Cox, Associate Director, National Center for Health Statistics (NCHS)
Abstract: Standard methods for statistical disclosure limitation (SDL) in tabular data either abbreviate, modify or suppress from publication the true (original) values of tabular cells. All of these methods are based on satisfying an analytical rule selected by the statistical office to distinguish cells and cell combinations exhibiting unacceptable risk of disclosure (the sensitive cells) from those that do not. The impact of these SDL methods on data analytic outcomes is not well-studied but can be shown to be subtle or severe in particular cases. Dandekar and Cox (2002) introduced a method for tabular SDL called controlled tabular adjustment (CTA). CTA replaces the value of each cell failing the analytical rule by a safe value, viz., a value satisfying the rule, and then uses linear programming to adjust the values of the nonsensitive cells to restore additivity of detail to totals throughout the tabular system. The linear programming framework allows adjustments to be selected so as to minimize any of a variety of linear measures of overall distortion to the data, e.g., total of absolute adjustments, total percent of absolute adjustments, etc. Cox and Dandekar (2003) provide further techniques for preserving data quality. While worthwhile, none of these techniques directly addresses the overarching issue: Will statistical analysis of original and disclosure limited data sets yield comparable results? We provide a mathematical programming framework and algorithms, introduced in Cox and Kelly (2003), that begins to address this issue. Specifically, we demonstrate how to preserve approximately mean values, variances and correlations when original data are subjected to CTA, and how to ensure approximately intercept=zero, slope=one simple linear regression between original and adjusted data.