Introduction to Machine Learning for Epidemiologists

Saturday, June 1, 2019 - Sunday, June 30, 2019  

Download the syllabus for this course

Download the flyer for this course

Course Description

This course will explore how epidemiologists can use machine learning to advance their research. The first module will provide a general introduction to machine learning and its utility for epidemiologists. The second module will introduce the different algorithms and validation techniques used in various disciplines. The following three modules will then focus on a specific application of machine learning within the field of epidemiology, providing clear examples from the scientific literature. Each module will include hands-on programming exercises in R/R Studio to provide practical experience in the application of machine learning for epidemiologic research. Readings and examples will cover multiple substantive areas of epidemiology.

Course Objectives

The primary objective of this course is to provide individuals with broad exposure to machine learning and its practical applications within epidemiology.


By the end of the course, students will be able to:


  • Discuss the scenarios where machine learning can (or cannot) enhance epidemiologic research
  • Identify and describe various learning algorithms
  • Review the process of evaluating learning algorithms and model selection
  • Demonstrate ability to utilize analytic tools that promote reproducibility
  • Apply learning algorithms to data and evaluate resulting models
  • Compare different machine learning approaches to address common challenges in epidemiologic research


Introductory epidemiology and biostatistics knowledge is assumed. Prior experience with R programming is extremely helpful, although sample code will be provided. Exercises could also be completed with other software or programming languages such as Python. However, sample programming code and support will only be provided in R.




In order to follow-along in guided analyses and complete course exercises, individuals will need access to R/R Studio. Both R and R Studio need to be downloaded and installed. You will need to install R first. R can be downloaded at R Studio can be downloaded at

Course Reading List

Scientific articles are listed within each of the five modules. These recommended readings will provide background for the module's topics and/or provide a clear example of machine learning implementation for epidemiologic research. The following textbooks are recommended for students who are interested in more detailed discussion of machine learning.


Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani. An Introduction to Statistical Learning : with Applications in R. New York :Springer, 2013. ISBN:978-1-4614-7137-0 DOI 10.1007/978-1-4614-7138-7.


Trevor Hastie, Robert Tibrishani, Jerome Friedman. The Elements of Statistical Learning: Data Mining, Inference and Prediction. Second Edition. New York: Springer, 2009. ISBN: 978-0-387-84857-0 DOI: 10.1007/b94608.


Jeanette Stingone, PHD, MPH

Dr. Jeanette Stingone is a formally-trained environmental epidemiologist with a focus on perinatal and pediatric health. She conducts research that couples data science techniques with epidemiologic methods to investigate how prenatal and early-life environmental exposures affect health and development throughout childhood and beyond. Currently, she is investigating how machine learning approaches can be used to uncover the combinations of multiple environmental exposures that contribute to disease and disability in children including birth defects, adverse neurodevelopment and early puberty. Dr. Stingone also has a strong interest in the use of collective science initiatives to advance public health research, and works to develop methods and approaches for data harmonization across diverse studies of environmental health.

Course Fee

Registration is $925.00.


The registration period has closed for this event.

Online Course Format

This is a month-long digital course, equivalent to approximately 20 hours of classroom instruction. Lectures and course material will be presented online in roughly weekly segments. The flexible format will include video or audio recordings of lecture material, file sharing and topical discussion, self-assessment exercises, and access to the instructor for feedback during the course. The course utilizes the learning management software, Canvas (; participants will receive an e-mail inviting them to join on the first day of the course. Any additional information about technical requirements and access to the course will be shared in the weeks before the course begins.

Share This