CS490: Natural Language Processing

Description

This course will give a comprehensive overview of the key concepts in natural language processing (NLP) and the techniques used for statistical modeling of natural language data. We will introduce and discuss several NLP tasks, such as sentiment analysis, information extraction, language modeling, syntactic parsing and semantic analysis. The course will be divided into three modules: (1) key statistical machine learning methods in NLP, (2) computational linguistic tasks and modeling approaches, and (3) generative AI including large language models (LLMs).

Lectures are held on Tuesdays and Thursdays at 1:30-2:45pm in Stanley Coulter Hall Room 239.

We will use Ed for class discussion and announcements, including announcements regarding assignments. If you are not in the Ed course, ask an instructor to be added.

Instructors and TA

Dan Goldwasser

Instructor

Office Hours: After class

Abulhair Saparov

Instructor

Office Hours: After class

Chengfei Wu

Office Hours: Wednesdays

9:30-10:30am, DSAI B047

Grading

Homework assignments: (30%) There will be 4 homework assignments that would require solving both open questions and programming assignments.
Final project: (30%) Students will have to submit a final project. It will be completed in teams of 2-4. Students will be expected to suggest a topic, submit a proposal, implement the project and submit a final report.
Final exam: (40%) There will be a written in-person final exam at the end of the course.

Schedule

Note the following schedule is subject to change throughout the course.

Date	Topic	Resources
01/13	Lecture 1: Introduction to NLP	[slides]
01/15	Lecture 2: Text Classification	[slides]
01/20	Lecture 3: Text Classification II	[slides]
01/22	Lecture 4: Neural Networks	[slides]
01/27	Lecture 5: Neural Networks II	[slides]
01/28	Homework 1 released	[hw1.pdf] [hw1.zip] [submit]
01/29	Lecture 6: Neural Networks III	[slides]
02/03	Lecture 7: Recurrent Neural Networks	[slides]
02/05	Class cancelled	⠀
02/10	Lecture 8: Attention and Transformers	[slides]
02/12	Lecture 9: Transformers II	[slides]
02/12	Homework 1 due	[submit]
02/13	Homework 2 released	[hw2.pdf] [hw2.zip] [submit]
02/17	Lecture 10: Computational Linguistics and Morphology	[slides]
02/19	Lecture 11: Morphology and Syntax	[slides]
02/24	Lecture 12: Syntax II	[slides]
02/26	Lecture 13: Structured Prediction	[slides]
02/27	Homework 2 due	[submit]
03/03	Homework 3 released	[hw3.pdf] [submit]
03/03	Lecture 14: Semantics	[slides]
03/05	Lecture 15: Lexical Semantics	[slides]
03/10	Lecture 16: Compositional Semantics	[slides]
03/12	Lecture 17: Pragmatics and Discourse
03/13	Homework 3 due	[submit]
03/13	Project proposal released	[proposal.pdf] [submit]
03/17	Spring Break: No class	⠀
03/19	Spring Break: No class	⠀
03/24	Lecture 18: Language Modeling
03/26	Lecture 19: Transformer Language Models
03/27	Project proposal due	[submit]
03/31	Lecture 20: Scaling
04/02	Lecture 21: Prompting
04/07	Lecture 22: Retrieval and Agents
04/09	Lecture 23: Fine-tuning
04/14	Lecture 24: Distillation
04/16	Lecture 25: Quantization
04/21	Lecture 26: Reinforcement Learning
04/23	Lecture 27: Reinforcement Learning II
04/28	Lecture 28: Multi-modal NLP
04/30	Lecture 29: Multi-modal NLP II

Late Policy

Assignments are to be submitted by the due date listed. Each person will be allowed a total of 5 late days which can be applied to any combination of assignments during the semester without penalty. After that, a late penalty of 15% per day will be assigned. Use of a partial day will be counted as a full day.

Use of extension days must be stated explicitly in the late submission (either directly in the submission header or by accompanying email to the TA), otherwise late penalties will apply. Extensions cannot be used after the final day of classes (i.e., December 13th 11:59pm).

Extension days cannot be rearranged after they are applied to a submission. Use them wisely!

Assignments will NOT BE accepted if they are more than five days late. Additional extensions will be granted only due to serious and documented medical or family emergencies.

Academic Honesty

Please read the departmental academic integrity policy. This will be followed unless we provide written documentation of exceptions. We encourage you to interact amongst yourselves: you may discuss and obtain help with basic concepts covered in lectures or the textbook, homework specification (but not solution), and program implementation (but not design). However, unless otherwise noted, work turned in should reflect your own efforts and knowledge. Sharing or copying solutions is unacceptable and could result in failure. We use copy detection software, so do not copy code and make changes (either from the Web or from other students). You are expected to take reasonable precautions to prevent others from using your work.

Policy on Use of Generative AI

Students are permitted to use generative AI tools such as ChatGPT if they find the tools to be helpful. These tools can help to accelerate low-level tasks, such as writing boilerplate code. However, we urge students to be wary of the output of such models on some tasks. These tools can be very effective for tasks such as paraphrasing or correcting grammar, but they do produce errors on other tasks, such as analysis of research papers or scientific scrutiny of an experimental setup. Be very mindful when using such tools to generate code, as they will insert bugs (often making unnatural/non-human mistakes, which can be sometimes very difficult to detect). Overly relying on AI can result in poor preparation for the final exam, which is a significant portion of the grade.