Digital Acquisition of Big Data

Monday, June 20, 2016 - Tuesday, June 21, 2016 10:00 AM - 3:30 PM
Download syllabus for this course

Course Description

This course will introduce participants to the methods available to acquire data from online sources. We will focus on three online sources - Twitter, Socrata, and Instagram. Each source uses a different interface. Surveying all three provides an overview of how to acquire text, numeric data, and images from online sources. We will discuss how online media use application programming interfaces (APIs) to make their data available, the acquisition of data in chunks versus streaming, and survey different formats for storing data (in the cloud or on a local computer). We will briefly touch on the curation and maintenance of data sets. We will program in Python and Shell Scripts.

This course is eligible for an EPIC scholarship. Visit the scholarship application page for more information.

Course Objectives

By the end of the course, participants will be able to:

  1. Access Twitter to create a data set on the topic of each participant's choosing
  2. Access Instagram to create a data set of images as a companion to (1)
  3. Access Socrata to retrieve a data set from CDC.
  4. Articulate the principles of maintaining and sharing large data sets


Familiarity with Linux, Python, BASH, will be helpful, but is not required. Experience programming will be helpful, but is not required.

Course Reading List

A detailed syllabus will be sent out at least two weeks before the start date.


Michael Chary, MD, PhD

Dr. Michael Chary received his MD and PhD (computational neuroscience, electrophysiology) from the Icahn School of Medicine at Mount Sinai. He completed residency in Emergency Medicine at New York-Presbyterian/Queens and is now a fellow in medical toxicology at Boston Children's Hospital. Michael's research has two focusses (1) analysis of social media for public health research and drug discovery using computational linguistics and applied mathematics and (2) developing explainable AI for clinical decision support. His group, ToxTweet, was the first to demonstrate that the geographic distribution of opioid use could be accurately estimated from Twitter and the dosage, signs and symptoms of dextromethorphan use inferred from YouTube comments. Michael has been teaching for EPIC since 2014. 

Course Fee

Registration is $450.00


The registration period has closed for this event.


Hammer 322

701 West 168th Street
New York, NY 10032

Click here for directions

Share This