Below are the courses that will be available to QMSS students during Fall Semester 2017. Course information will continue to be updated as it becomes available. If you see discrepancies between this list and the Columbia Directory of Classes or Vergil, you should default to the details on this page.

Advanced Registration for Fall Semester begins Monday, April 17th. You should check SSOL to see when your first registration appointment is. Full information is available on GSAS Academic Calendar.


UPDATED 12/20/2017

Data Visualization
Thomas Brambor
M 6:10P-8:00P
This course will provide a hands-on introduction to visualizing a wide variety of different types data. It is aimed at graduate students for the Master of Arts Degree in Quantitative Methods in the Social Sciences (QMSS). The course combines tutorial style introductions to different software tools and visualization packages centered around the R language, practical tips on analyzing and presenting real data, and some readings and discussion of the principles of data visualization. In the course, we progress from a set of basic static graphs to mapping geographic data, text, social networks, and other forms of data in dynamic and interactive displays. Examples will be drawn from a variety of disciplines in and beyond the social sciences, and you will be encouraged to work with your own data to create custom graphics.

GIS and Spatial Analysis of Social Data
Michael Parrott
M 4:10P-6:00P
This course introduces students to basic spatial analytic skills. It covers introductory concepts and tools in Geographic Information Systems (GIS) and database management. As well, the course introduces students to the process of developing and writing an original spatial research project. Topics to be covered include: social theories involving space, place and reflexive relationships; social demography concepts and databases; visualizing social data using geographic information systems; exploratory spatial data analysis of social data and spatially weighted regression models, spatial regression models of social data, and space-time models. Use of open-source software (primarily the R software package) will be taught as well.

Social Network Analysis
Greg Eirich
T 10:10A-12:00P
The goal of this course is to introduce students to the main methods, models and concepts behind social network analysis. Over the last few decades, social scientists increasingly have investigated how people's relationships with others affect their health, wealth and popularity, among other things. With the growth of the Internet and on-line communities and social media sites, there is more (and more varied) social network data than ever. In this course, we will learn how to think about, analyze and visually display social network data. The literature on networks has grown to such a degree that it would be impossible to cover all of it in one semester, but we will focus mostly on the core concepts; how they can be incorporated into traditional regression models; and practicing the analyses ourselves. Only basic mathematics skills (like algebra) and a basic knowledge of regression are assumed. Another important goal of the course is to teach students how to manipulate, analyze and visualize network data themselves using statistical software. We will mainly use the program R for most of the software work. Lab assignments will be given out, and we will aim to have weekly lab meetings (which will be completely optional​) right after class, but only if a space can be found. Regardless, there will be copies of the code used in lab for students to practice at their convenience.

Master's Thesis
Elena Krumova
T 6:10P-8:00P
This course is designed to help you make consistent progress on your master’s thesis throughout the semester, as well as to provide structure during the writing process. The master’s thesis, upon completion, should answer a fundamental research question in the subject matter of your choice. It should be an academic paper based on data that you can acquire, clean, and analyze within a single semester, with an emphasis on clarity and policy relevance. Remember that your thesis is not designed to be the crowning achievement of your career. If you find that the scale of your topic is too great, please choose a limited number of research questions to explore for the master’s thesis. Keep in mind that your time is limited! Early semester homework: Selecting a topic of interest is often the most difficult part of writing an academic paper, but deciding on the data you will be using is a significant step towards completing a satisfactory dissertation project. We will discuss your data before exploring plausible research designs. If you have elected to change topics from the literature review you prepared for G4010, let me know and begin researching other ideas so that you are prepared to move quickly through the semester.

Bayseian Statistics for the Social Sciences
Benjamin Goodrich
TR 4:10P-5:25P
An introduction to Bayesian statistical methods with applications to the social sciences. This course will be less technical than similar courses sometimes offered by the Statistics Department. Considerable emphasis will be placed on regression modeling and model checking. The primary software used will be Stan, which students do not need to be familiar with in advance. We will access the Stan library via R, so some experience with R would be helpful.

Topics in Applied Data Science for Social Scientists
Marco Morales
W 6:10P-8:00P
In his now classic Venn diagram, Drew Conway described Data Science as sitting at the intersection between good hacking skills, math and statistics knowledge, and substantive expertise. By training, social scientists possess a fluid combination of all three, but also bring an additional layer to the mix. We have acquired slightly different training, skills and expertise tailored to understand human behavior, and to explain why things happen the way they do. Social scientists are, thus, a particular kind of data scientist. This course is not intended to teach you how to code, create visualizations, or estimate models. It presumes you have learned that in other classes. This course is intended to take you to the next level in becoming a data scientist.

Modern Data Structures
Michael Parrott
W 6:10P-8:00P
This course is intended to provide a detailed tour on how to access, clean, “munge” and organize data, both big and small. (It should also give students a flavor of what would be expected of them in a typical data science interview.) Each week will have simple, moderate and complex examples in class, with code to follow. Students will then practice additional exercises at home. The end point of each project would be to get the data organized and cleaned enough so that it is in a data-frame, ready for subsequent analysis and graphing. Therefore, no analysis or visualization (beyond just basic tables and plots to make sure everything was correctly organized) will be taught; and this will free up substantial time for the “nitty-gritty” of all of this data wrangling.

Research Seminar
GR4021 & GR4022
Gregory Eirich
W 08:10P-10:00P
This course has two goals. One, it is designed to expose students in the QMSS degree program to different methods and practices of social science research. Seminar presentations are given on a wide range of topics by faculty from Columbia and other New York City universities, as well as researchers from other settings. Two, it is also designed to give students important professional development skills, particularly around academic writing, research methods and job skills.
VIEW PREVIOUS SYLLABUS HERE (NOTE: Speakers will differ from last spring)

Data Analysis for the Social Sciences
Christy Baker-Smith
R 6:10p-8:00P
This course is meant to provide an introduction to probability and social statistics, tailored to the types of analyses and data issues encountered by QMSS students. The chief goal is to help students generate and interpret quantitative data in helpful and provocative ways. The hope is that by trying to measure the social world, students will see their thinking become clearer and their understandings of concepts grow more complex. They will also become competent at reading statistical results in social science publications and in other media. Only basic mathematics skills are assumed, but it is hoped that students will become more facile with numbers, functions and their relationships. Another important goal of the course is to teach students how to manipulate and analyze data themselves using statistical software. We will focus mainly on the program R. There will be an optional lab section every other week, which will be devoted to using these software programs to practice commands and to develop a paper using the General Social Survey, World Values Survey or another dataset of the student’s choosing.

Data Mining
Michael Parrott
R 6:10P-8:00P
The class is roughly divided into two parts: 1. programming best practices, exploratory data analysis (EDA), and unsupervised learning 2. supervised learning including regression and classification methods In the first part of the course we will focus writing R programs in the context of simulations, data wrangling, and EDA. Unsupervised learning is focused on problems where the outcome variable is not known and the goal of the analysis is to find hidden structure in data such as different market segments from buying patterns or human population structure from genetics data. Supervised learning deals with prediction problems where the outcome variable is known such as predicting a price of a house in a certain neighborhood or an outcome of a congressional race.

Advanced Analytic Techniques
Greg Eirich
F 10:10A-12:00P
This course is meant to train students in advanced quantitative techniques in the social sciences. We will look at four main areas of interest. One -- modeling of limited dependent variables, like Poisson, tobit and gamma-distributed will be discussed. Two -- creating and analyzing text as data, including “bag of words” analysis, contextual analysis and topic modeling. Three -- ways of better approximating experimental designs with observation data will be highlighted, like instrumental variables, propensity score matching and regression continuity. Finally, four -- modeling of multilevel data, like panel data and geographic data, will also be practiced. Another important goal of the course is to teach students how to manipulate, analyze and visualize network data themselves using statistical software. We will mainly use the program R for most of the software work. Lab assignments will be given out, and we will aim to have weekly lab meetings ( which will be completely optional ) right after class, but only if a space can be found. Regardless, there will be copies of the code used in lab for students to practice at their convenience. Students ought to be familiar with regression models from other courses, but only basic math will be presumed.

Non-QMSS Concentration Classes


Advanced Microeconomics
Susan M Elmes
MW 4:10pm-5:25pm
The course provides a rigorous introduction to microeconomics. Topics will vary with the instructor but will include consumer theory, producer theory, general equilibrium and welfare, social choice theory, game theory and information economics. This course is strongly recommended for students considering graduate work in economics.


Machine Learning for Data Science
COMS W4721
Daniel Hsu
MW 5:40pm-6:55pm
COMS 4721 is a graduate-level introduction to machine learning. The course covers basic statistical principles of supervised machine learning, as well as some common algorithmic paradigms. Additional topics, such as representation learning and online learning, may be covered if time permits.

Exploratory Data Analysis and Visualization
Joyce T Robbins
TR 5:40pm-6:55pm
A course in computer programming. This course covers visual approaches to exploratory data analysis, with a focus on graphical techniques for finding patterns in high dimensional datasets. We consider data from a variety of fields, which may be continuous, categorical, hierarchical, temporal, and/or spatial in nature. We cover visual approaches to selecting, interpreting, and evaluating models/algorithms such as linear regression, time series analysis, clustering, and classification.