Back to Programme

Analyze Survey Data for Free (http://asdfree.com/)

Anthony Damico (Independent Consultant) - United States

Keywords: Data Analysis, Methods, Sample Design, R Programming, Open Source


Abstract

Governments, NGOs, and other research institutes spend billions of dollars each year collecting demographic, economic, and health information about their populations. These efforts form the basis of many official reports, academic journal articles, and public health surveillance systems, each of which motivate public policy or inform the public to varying degrees. Though dependent on the sensitivity of the topic, these sponsoring organizations often publish household-level, person-level, or company-level datasets alongside their final, summary report. This response-level data (commonly known as microdata) allows external researchers both to reproduce the original findings and also to more deeply focus on segments of the population perhaps not discussed in the data products released by the authors of the original investigation. For example, the Census Bureau publishes an annual report, "Income and Poverty in the United States" with a series of tables, and also a database with one record per individual within each sampled households. While the Bureau helpfully provides many different cross-tabulations of their results, an external researcher might find utility in this dataset by investigating other groups (such as different age cutoffs or dollar thresholds), and so the public microdata files allow continued research where it otherwise might end. The website http://asdfree.com/ offers obsessively-detailed instructions to analyze a wide variety of publicly-available datasets using the R language. This resource generally contains three core components, each with step-by-step instructions: (1) Download automation or data acquisition; (2) Helpfully-noted analysis examples; (3) Replication of published estimates to prove correct methodology.