Description

This graduate course will provide a research-oriented overview of the key concepts in natural language processing (NLP) and the techniques used for statistical modeling of natural language data. We will introduce and discuss several NLP tasks, including but not limited to text classification, sentiment analysis, information extraction, language modeling, syntactic parsing, and semantic analysis.

Lectures are held on Tuesdays and Thursdays at 6:00-7:15pm in the Physics Building Room 203.

Instructors and TAsOffice Hours
Abulhair SaparovAfter class
Nathaniel GetachewWednesdays 4:30-5:30pm, DSAI B061
Yunxin SunThursdays 4:00-5:00pm, DSAI B047


Grading

  • Homework assignments: (30%) There will be 2-3 homework assignments that would require solving both open questions and programming assignments.
  • Paper critique: (10%) Students will also have to submit a paper critique of a recent NLP research paper.
  • Final project: (30%) Students will have to submit a final project. It will be completed in teams of 3-4. Students will be expected to select a topic, submit a proposal, implement the project and submit a final report.
  • Final exam: (30%) There will be a written in-person final exam at the end of the course.


Schedule

Note the following schedule is subject to change throughout the course.

DateTopicResources
08/26Lecture 1: Introduction to NLP[slides]
08/28Lecture 2: Text Classification[slides]
09/02Lecture 3: Language Modeling 
09/04Lecture 4: Recurrent Neural Networks 
09/09Lecture 5: LSTMs and GRUs 
09/11Lecture 6: Attention and Transformers 
09/16Lecture 7: Transformers II 
09/18Lecture 8: Scaling 
09/23Lecture 9: Prompting 
09/25Lecture 10: Reinforcement Learning 
09/30Lecture 11: Reinforcement Learning II 
10/02Lecture 12: Efficiency 
10/07Lecture 13: Efficiency II 
10/09TBA 
10/14Fall Break: No class
10/16Lecture 14: Efficiency III 
10/21Lecture 15: Efficiency IV 
10/23Lecture 16: Efficiency V 
10/28Lecture 17: Efficiency VI 
10/30Lecture 18: Mixture of Experts and Retrieval 
11/04Lecture 19: Computational Linguistics and Morphology 
11/06Lecture 20: Syntax 
11/11Lecture 21: Syntax II 
11/13Lecture 22: Syntax III 
11/18Lecture 23: Syntax IV and Semantics 
11/20Lecture 24: Semantics II 
11/25TBA 
11/27Thanksgiving: No class
12/02TBA 
12/04Lecture 25: Semantics III 
12/09Lecture 26: Multi-modal NLP 
12/11Lecture 27: Multi-modal NLP II 


Late Policy

Assignments are to be submitted by the due date listed. Each person will be allowed a total of 5 late days which can be applied to any combination of assignments during the semester without penalty. After that, a late penalty of 15% per day will be assigned. Use of a partial day will be counted as a full day.

Use of extension days must be stated explicitly in the late submission (either directly in the submission header or by accompanying email to the TA), otherwise late penalties will apply. Extensions cannot be used after the final day of classes (i.e., May 9th 11:59pm).

Extension days cannot be rearranged after they are applied to a submission. Use them wisely!

Assignments will NOT BE accepted if they are more than five days late. Additional extensions will be granted only due to serious and documented medical or family emergencies.


Academic Honesty

Please read the departmental academic integrity policy. This will be followed unless we provide written documentation of exceptions. We encourage you to interact amongst yourselves: you may discuss and obtain help with basic concepts covered in lectures or the textbook, homework specification (but not solution), and program implementation (but not design). However, unless otherwise noted, work turned in should reflect your own efforts and knowledge. Sharing or copying solutions is unacceptable and could result in failure. We use copy detection software, so do not copy code and make changes (either from the Web or from other students). You are expected to take reasonable precautions to prevent others from using your work.


Policy on Use of Generative AI

Students are not only permitted but encouraged to use generative AI tools such as ChatGPT if they find the tools to be helpful. These tools can help to accelerate low-level tasks, such as writing boilerplate code. However, we urge students to be wary of the output of such models on some tasks. These tools can be very effective for tasks such as paraphrasing or correcting grammar, but they do produce errors on other tasks, such as analysis of research papers or scientific scrutiny of an experimental setup. Be very mindful when using such tools to generate code, as they will insert bugs (often making unnatural/non-human mistakes, which can be sometimes very difficult to detect).