CS577: Natural Language Processing

Description

This graduate course will provide a research-oriented overview of the key concepts in natural language processing (NLP) and the techniques used for statistical modeling of natural language data. We will introduce and discuss several NLP tasks, including but not limited to text classification, sentiment analysis, information extraction, language modeling, syntactic parsing, and semantic analysis.

Lectures are held on Tuesdays and Thursdays at 6:00-7:15pm in the Physics Building Room 203.

We will use Ed for class discussion and announcements, including announcements regarding assignments. If you are not in the Ed course, ask the instructor to be added.

Instructors and TAs

Abulhair Saparov

Instructor

Office Hours: After class

Nathaniel Getachew

Office Hours: Wednesdays

4:30-5:30pm, DSAI B061

Yunxin Sun

Office Hours: Thursdays

4:00-5:00pm, DSAI B047

Grading

Homework assignments: (30%) There will be 2-3 homework assignments that would require solving both open questions and programming assignments.
Paper critique: (10%) Students will also have to submit a paper critique of a recent NLP research paper.
Final project: (30%) Students will have to submit a final project. It will be completed in teams of 3-4. Students will be expected to select a topic, submit a proposal, implement the project and submit a final report.
Final exam: (30%) There will be a written in-person final exam at the end of the course.

Schedule

Note the following schedule is subject to change throughout the course.

Date	Topic	Resources
08/26	Lecture 1: Introduction to NLP	[slides]
08/28	Lecture 2: Text Classification	[slides]
09/02	Lecture 3: Language Modeling	[slides]
09/04	Lecture 4: Recurrent Neural Networks	[slides]
09/09	Lecture 5: LSTMs and GRUs	[slides]
09/11	Lecture 6: Attention and Transformers	[slides]
09/14	Homework 1 released	[hw1.pdf] [hw1.tar.gz] [fixed yml]
09/16	Lecture 7: Transformers II	[slides]
09/18	Lecture 8: Scaling	[slides]
09/23	Lecture 9: Prompting	[slides]
09/25	Lecture 10: Efficiency	[slides]
09/26	Homework 1 due	⠀
09/30	Lecture 11: Efficiency II	[slides]
10/02	Lecture 12: Efficiency III	[slides]
10/07	Lecture 13: Reinforcement Learning	[slides]
10/09	Lecture 14: Reinforcement Learning II	[slides]
10/10	Paper Critique due	[critique.pdf]
10/14	Fall Break: No class	⠀
10/16	Lecture 15: Quantization	[slides]
10/21	Lecture 16: Pruning	[slides]
10/23	Lecture 17: Distillation	[slides]
10/27	Project Proposal due	[proposal.pdf]
10/28	Lecture 18: Mixture of Experts	[slides]
10/30	Lecture 19: Retrieval and Computational Linguistics	[slides]
11/04	Homework 2 released	[hw2.pdf]
11/04	Lecture 20: Morphology and Syntax	[slides]
11/06	Lecture 21: Syntax II	[slides]
11/11	Lecture 22: Syntax III	[slides]
11/13	Lecture 23: Syntax IV and Semantics	[slides]
11/18	Lecture 24: Language Model Agents	[slides]
11/20	Lecture 25: Semantics II	[slides]
11/21	Homework 2 due	⠀
11/25	Guest Lecture: Raymond Yeh
11/27	Thanksgiving: No class	⠀
12/02	Guest Lecture: Tianyi Zhang Going Beyond Linear Conversation: Enhancing AI-assisted Programming via Mutual Grounding
12/04	Guest Lecture: Dan Goldwasser
12/09	Lecture 26: Semantics III	[slides]
12/11	Lecture 27: Multi-modal NLP	[slides]
12/18	Final Exam LILY G126, 10:30am-12:30pm	[exam_practice.pdf]
12/20	Project Final Report due	[final_report.pdf]

Late Policy

Assignments are to be submitted by the due date listed. Each person will be allowed a total of 5 late days which can be applied to any combination of assignments during the semester without penalty. After that, a late penalty of 15% per day will be assigned. Use of a partial day will be counted as a full day.

Use of extension days must be stated explicitly in the late submission (either directly in the submission header or by accompanying email to the TA), otherwise late penalties will apply. Extensions cannot be used after the final day of classes (i.e., December 13th 11:59pm).

Extension days cannot be rearranged after they are applied to a submission. Use them wisely!

Assignments will NOT BE accepted if they are more than five days late. Additional extensions will be granted only due to serious and documented medical or family emergencies.

Academic Honesty

Please read the departmental academic integrity policy. This will be followed unless we provide written documentation of exceptions. We encourage you to interact amongst yourselves: you may discuss and obtain help with basic concepts covered in lectures or the textbook, homework specification (but not solution), and program implementation (but not design). However, unless otherwise noted, work turned in should reflect your own efforts and knowledge. Sharing or copying solutions is unacceptable and could result in failure. We use copy detection software, so do not copy code and make changes (either from the Web or from other students). You are expected to take reasonable precautions to prevent others from using your work.

Policy on Use of Generative AI

Students are not only permitted but encouraged to use generative AI tools such as ChatGPT if they find the tools to be helpful. These tools can help to accelerate low-level tasks, such as writing boilerplate code. However, we urge students to be wary of the output of such models on some tasks. These tools can be very effective for tasks such as paraphrasing or correcting grammar, but they do produce errors on other tasks, such as analysis of research papers or scientific scrutiny of an experimental setup. Be very mindful when using such tools to generate code, as they will insert bugs (often making unnatural/non-human mistakes, which can be sometimes very difficult to detect).