Natural Languange Processing

March 3, 2017

9:00am - 12:30 pm

DBH 4011

Instructors: Garren Gaut

Introduction to Natural Language Processing

Natural language processing (NLP) refers to the methods and technologies used to allow computers to process, understand, and perform tasks using human language. Common NLP tasks include sentiment analysis, part-of-speech tagging, named entity recognition, machine translation, document classification, clustering, and topic extraction. This course will introduce fundamental concepts in NLP including word and document representation, text processing, document classification, document similarity, and clustering, and dimensionality reduction. The course will be taught using jupyter notebooks in python. NLP tools covered will be sci-kit learn and ntlk.

Who: This course is targeted primarily at graduate students and researchers who have some experience with machine learning and python, but are new to NLP.

Requirements: Participants must bring a laptop with a few specific software packages installed (see Pre-Workshop Instructions).

Prerequisites: A previous course in programming is strongly recommended. Experience with basic machine learning is recommended.

Contact: Please mail for more information.

Tentative Schedule

8:30-9:00 Sign-in (coffee & bagels)
9:00-10:30 Text Processing and Document Classification
10:30 - 10:45 Break
10:45-12:30 Document Similarity and Clustering


  1. Introduction/Preparation
    • Common NLP Tasks
    • Word and Document Representation
    • Text Processing
  2. Document Classification
    • Text Processing
    • TFIDF
    • Evaluation
  3. Clustering
    • Document Similartiy
  4. Dimensionality Reduction (Time Permitting)
    • Topic Modeling
    • Visualization

Pre-Workshop Instructions

See the course’s GitHub page for instructions: