Research Access to Restricted-Use Data

In last issue’s column, John Abowd and Lars Vilhuber discussed the interplay between science, confidentiality, and the public good and outlined some of the data access modalities, in particular a synthetic data model implemented by the Cornell Virtual Research Data Center. In this month’s column, Saki Kinney and Alan Karr from the National Institute of Statistical Sciences give details about how access is operationalized and offer practical guidelines and considerations for researchers seeking access to confidential data from government agencies.

Official statistics agencies in the United States and other countries have long faced conflicts between two of their many missions. On the one hand, these agencies are charged with collecting vast amounts of high-quality data about individuals and establishments such as businesses, health care providers, and universities in a manner that protects the privacy of data objects and the confidentiality of databases. On the other hand, they must disseminate information for diverse purposes; among the most important are formulation and evaluation of policies and supporting research conducted by academics, other government agencies, and private citizens. More and more, data are also being collected and held by the states. Most notably, state education agencies (SEAs) are building statewide longitudinal data systems (SLDS) containing individual-level data on students and teachers in public schools. Even though SLDS were built under pressure from, and in many cases with funds provided by, the U.S. Department of Education, they are owned and controlled by the states.

