Analysis of Complex Survey Data

Tuesday, June 1, 2021 - Wednesday, June 30, 2021

Download the syllabus for this course

Download the flyer for this course

Course Description

Complex survey data violate typical assumptions about simple random samples of independent observations, thus requiring specialized statistical techniques. This course will provide participants with practical skills to analyze data arising from complex epidemiologic sampling designs. The theory behind complex sampling strategies and the necessity of applying appropriate statistical techniques to analyze these data and make valid inferences will be discussed. National Household Survey on Drug Use and Health (NSDUH) data will be used for applied demonstrations, illustrating concepts applicable to all datasets arising from complex survey designs. We will demonstrate the appropriate use of sampling weights in the NSDUH data and how the appropriate weight is specific to the research question being asked. We will demonstrate how to obtain basic descriptive statistics, appropriate variance estimates, and regression parameters in R (SAS and Stata code will also be provided).

Course Objectives

Students who successfully complete this course will be able to:


  • Understand the theoretical basis for complex survey designs and its relationship to external validity.
  • Understand the influence of design effects on standard errors and why special statistical procedures are needed to analyze complex survey data.
  • Have familiarity with the National Surveys on Drug Use and Health.
  • Use R software to analyze complex survey data including: basic descriptive analysis, bivariate analysis, and regression models (SAS and Stata code will also be provided).



Course Reading List



  1. Heeringa, S.G., West, B.T., Berglund, P.A. 2010. Applied Survey Data Analysis. Chapman and Hall/CRC. (selected chapters
  2. Martins, S. S., Santaella-Tenorio, J., Marshall, B. D., Maldonado, A., & Cerdá, M. (2015). Racial/ethnic differences in trends in heroin use and heroin-related risk behaviors among nonmedical prescription opioid users. Drug and alcohol dependence, 151, 278-283
  3. Survey Analysis in R.




  1. 2012 National Survey of Drug Use and Health sample design report
  2. Kreuter F, Valliant R. A survey on survey statistics: What is done and can be done in Stata. The Stata Journal. 2007, 7(1):1-21
  3. Lumley, T. (2011). Complex surveys: a guide to analysis using R (Vol. 565). John Wiley & Sons
  4. Nadimpalli, V., Hubbell, K. 2012. Simplifying the Analysis of Complex Survey Data Using the SAS® Survey Analysis Procedures.


Natalie Levy, MPH, PhD Candidate

Natalie Levy is a sixth-year doctoral student and a graduate research assistant in the Department of Epidemiology at the Columbia University Mailman School of Public Health. Prior to beginning the doctoral program, Natalie worked at the New York City Department of Health and Mental Hygiene where she developed expertise in managing and analyzing large datasets. Natalie’s research often relies on the analysis of complex survey data, including the National Survey on Drug Use and Health, and encompasses work on early-life predictors of gambling and problem gambling; the effects of medical and recreational cannabis laws on cannabis use, cannabis use disorder, and perceptions; domestic violence; and epidemiologic methods.

Luis Segura, MD, MPH, DrPH Candidate

Luis Segura is a sixth-year doctoral candidate in the department of Epidemiology at the Columbia University Mailman School of Public Health. He completed his MD at the University Autonomous of Nuevo Leon (UANL), and his MPH at the National Institute of Public Health in Cuernavaca, Mexico. As part of Dr. Silvia S. Martin’s research team, his research involves analyzing complex survey data, including the National Survey on Drug Use and Health, to estimate the effect of drug policies on changes over time in prescription opioid use, marijuana use, and substance use disorders. He is most interested in applying epidemiological causal inference methods in the field of substance abuse and mental health.

Course Fee

Early registration discount before April 1, 2021: NA
After April 1, 2021: $250.00


The registration period has closed for this event.

Online Course Format

This is a short digital course, equivalent to approximately 5 hours of classroom instruction. Lectures and course material will be presented online. The flexible format will include video or audio recordings of lecture material, file sharing and topical discussion fora, self-assessment exercises, real-time electronic office hours and access to instructors for feedback during the course. Registrants for EPIC digital courses should have high-speed internet access. Any additional information about technical requirements and access to the course will be provided the month before the course begins.

Share This