Data Science Initiative: University of California, Irvine

Introduction to Natural Language Processing

Natural language processing (NLP) refers to the methods and technologies used to allow computers to process, understand, and perform tasks using human language. Common NLP tasks include sentiment analysis, part-of-speech tagging, named entity recognition, machine translation, document classification, clustering, and topic extraction. This course will introduce fundamental concepts in NLP including word and document representation, text processing, document classification, document similarity, and clustering, and dimensionality reduction. The course will be taught using jupyter notebooks in python. NLP tools covered will be sci-kit learn and ntlk.

Who: This course is targeted primarily at graduate students and researchers who have some experience with machine learning and python, but are new to NLP.

Requirements: Participants must bring a laptop with a few specific software packages installed (see Pre-Workshop Instructions).

Prerequisites: A previous course in programming is strongly recommended. Experience with basic machine learning is recommended.

Contact: Please mail ggaut@uci.edu for more information.

Tentative Schedule

Time
8:30-9:00	Sign-in (coffee & bagels)
9:00-10:30	Text Processing and Document Classification
10:30 - 10:45	Break
10:45-12:30	Document Similarity and Clustering

Syllabus

Introduction/Preparation
- Common NLP Tasks
- Word and Document Representation
- Text Processing
Document Classification
- Text Processing
- TFIDF
- Evaluation
Clustering
- Document Similartiy
Dimensionality Reduction (Time Permitting)
- Topic Modeling
- Visualization

Pre-Workshop Instructions

See the course’s GitHub page for instructions: https://github.com/UCIDataScienceInitiative/NLP

Natural Languange Processing

Introduction to Natural Language Processing

Tentative Schedule

Syllabus

Pre-Workshop Instructions

Registration