Below are the courses that will be available to QMSS students during Fall Semester 2017. Course information will continue to be updated as it becomes available. If you see discrepancies between this list and the Columbia Directory of Classes or Vergil, you should default to the details on this page.

Advanced Registration for Fall Semester begins Monday, April 17th. You should check SSOL to see when your first registration appointment is. Full information is available on GSAS Academic Calendar.


Master's Thesis
Elena Krumova
T 6:10P-8:00P
This course is designed to help you make consistent progress on your master’s thesis throughout the semester, as well as to provide structure during the writing process. The master’s thesis, upon completion, should answer a fundamental research question in the subject matter of your choice. It should be an academic paper based on data that you can acquire, clean, and analyze within a single semester, with an emphasis on clarity and policy relevance. Remember that your thesis is not designed to be the crowning achievement of your career. If you find that the scale of your topic is too great, please choose a limited number of research questions to explore for the master’s thesis. Keep in mind that your time is limited! Early semester homework: Selecting a topic of interest is often the most difficult part of writing an academic paper, but deciding on the data you will be using is a significant step towards completing a satisfactory dissertation project. We will discuss your data before exploring plausible research designs. If you have elected to change topics from the literature review you prepared for G4010, let me know and begin researching other ideas so that you are prepared to move quickly through the semester.

Data Mining
Benjamin Goodrich
W 6:10P-8:00P
The class is roughly divided into two parts: 1. programming best practices, exploratory data analysis (EDA), and unsupervised learning 2. supervised learning including regression and classification methods In the first part of the course we will focus writing R programs in the context of simulations, data wrangling, and EDA. Unsupervised learning is focused on problems where the outcome variable is not known and the goal of the analysis is to find hidden structure in data such as different market segments from buying patterns or human population structure from genetics data. Supervised learning deals with prediction problems where the outcome variable is known such as predicting a price of a house in a certain neighborhood or an outcome of a congressional race.

Topics in Applied Data Science for Social Scientists
Marco Morales
W 06:10P-8:00P
Data Science sits at the intersection of good hacking skills, math & statistics knowledge, and substantive expertise. Social scientists – by virtue of their training – are naturally equipped to find a niche answering “why” questions in data science, which is not “natural” for data scientists coming from other disciplines. This course is intended to: lead students to gauge their potential within data science; expose students to data science practitioners and explore real-life data science applications from a social science perspective; provide hands-on experience addressing real-world data science problems and challenges; and provide training in skills that are in high-demand among data scientists but which are not considered part of an integral training in social science currently.

Research Seminar
GR4021 & GR4022
Gregory Eirich
W 08:10P-10:00P
This course has two goals. One, it is designed to expose students in the QMSS degree program to different methods and practices of social science research. Seminar presentations are given on a wide range of topics by faculty from Columbia and other New York City universities, as well as researchers from other settings. Two, it is also designed to give students important professional development skills, particularly around academic writing, research methods and job skills.
VIEW PREVIOUS SYLLABUS HERE (NOTE: Speakers will differ from last spring)

Data Analysis for the Social Sciences
Elena Krumova
Th 06:10P-08:00P
This course is meant to provide an introduction to probability and social statistics, tailored to the types of analyses and data issues encountered by QMSS students. The chief goal is to help students generate and interpret quantitative data in helpful and provocative ways. The hope is that by trying to measure the social world, students will see their thinking become clearer and their understandings of concepts grow more complex. They will also become competent at reading statistical results in social science publications and in other media. Only basic mathematics skills are assumed, but it is hoped that students will become more facile with numbers, functions and their relationships. Another important goal of the course is to teach students how to manipulate and analyze data themselves using statistical software. We will focus mainly on the program R. There will be an optional lab section every other week, which will be devoted to using these software programs to practice commands and to develop a paper using the General Social Survey, World Values Survey or another dataset of the student’s choosing.

Advanced Analytic Techniques
Gregory Eirich
F 10:10A-12:00P
This course is meant to train students in advanced quantitative techniques in the social sciences. We will look at four main areas of interest. One ­­ modeling of limited dependent variables, like Poisson, tobit and gamma­distributed will be discussed. Two ­­ modeling of multilevel data, like panel data and geographic data, will also be practiced. Three ­­ ways of better approximating experimental designs with observation data will be highlighted, like instrumental variables, propensity score matching and regression continuity. Four ­­ creating and analyzing text as data, including “bag of words” analysis, contextual analysis and topic modeling. Another important goal of the course is to teach students how to manipulate and analyze data themselves using statistical software. We will focus mainly on the program R. The last hour of each class will be devoted to using this software program to practice commands and to complete lab assignments. Students ought to be familiar with regression models from other courses, but only basic math will be presumed.

Missing Data
Banjamin Goodrich
F 06:10P-08:00P
This goal of this course is to provide students with a basic knowledge of the potential implications of missing data for their data analyses as well as potential solutions. We will begin by discussing different types of mechanisms that can generate missing data. This will lay the groundwork for discussions of what types of missing data scenarios can be accommodated by each missing data method discussed subsequently. Finally, we will learn how to deal with missing data in Stata. More advanced techniques will be covered in Bayesian Statistics for the Social Sciences in the Spring.