Predictive Modeling with Python

April 21, 2017

9:00am - 5:00pm

DBH 4011

Instructors: Preston Hinkle, John Schomberg

TAs: Gabe Yu


Python is a popular language for scientific computing and machine learning. This course will introduce general modeling concepts via exercises in which participants implement regression models in raw python and via the scikit-learn library. Focus will be given to model fitting and evaluation. The course will be taught mostly through the medium of Jupyter notebooks.

Who: This course is targeted primarily at graduate students and researchers who have not already taken a full course in machine learning.

Requirements: Participants must bring a laptop with a few specific software packages installed (see Pre-Workshop Instructions).

Prerequisites: A previous course in programming is strongly recommended.

Contact: Please mail or for more information.

Tentative Schedule

8:30-9:00 Sign-in (coffee & bagels)
9:00-10:30 The IPython Notebook and Pandas
10:30 - 10:45 Break
10:45-12:30 Linear Regression and Predictive Modeling
12:30-1:00 Lunch
1:00-2:30 Out of Sample Prediction
2:30-2:45 Break (coffee)
2:45-4:30 Logistic Regression


  1. Introduction/Preparation
    • Numpy
    • Pandas
  2. Linear Regression
    • Model
    • GD Solution
    • OLS Solution
  3. Out of Sample Prediction
    • Cross validation
    • Regularization
  4. Logistic Regression
    • Model
    • GD Solution
    • Newton’s method
    • More Applications
      • *

Pre-Workshop Instructions

See the course’s GitHub page for instructions: We’ll expect you to have the Anaconda Python distribution installed with version 2.7 activated: