Large Language Models:Theory and Practice (Pretrained Models II)

Course Description

Large-scale pretrained models (Bert, T5 and GPT) have dramatically changed the landscape of artificial intelligence (natural language processing, computer vision and robotics). With huge parameters, these models encode rich knowledge and prove effective backbones for downstream tasks over training models from scratch.

This seminar is the second part of “pretrained models” course and focuses on transformer-based pretrained models with encoder-decoder and decoder-only architecture. It contains four parts:

  • Part 1: Introduction
  • Part 2: Model Architecture and Learning
  • Part 3: Model Analysis and Interpretation
  • Part 4: Efficient LLMs

Instructor: Meng Li

Time: Tuesdays, 2:15-3:45pm (first meeting on April 22)

Room: 2.14.0.35

Course Management System: Moodle

Syllabus

(Note: You are welcome to suggest other topics or papers.)

Date Topic Readings Related Materials Presenter Slides
  Part 1: Introduction        
2024/04/22 Introduction to LLM     Meng slides
2024/04/29 Build LLMs from Scratch     Meng slides
2024/05/06 no class        
  Part 2: Model Architecture and Learning        
2024/05/13 In-context Learning and Prompt Engineering (1) Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?; (2) An Explanation of In-context Learning as Implicit Bayesian Inference      
2024/05/20 RLHF and Alignment (1) Secrets of RLHF in Large Language Models Part I: PPO; (2) Secrets of RLHF in Large Language Models Part II: Reward Modeling      
2024/05/27 Mixture of Experts (1) GLaM: Efficient Scaling of Language Models with Mixture-of-Experts; (2) A Closer Look into Mixture-of-Experts in Large Language Models      
  Part 3: Model Analysis and Interpretation        
2024/06/03 Interpretability (1) How new data permeates LLM knowledge and how to dilute it; (2) Wasserstein Distances, Neuronal Entanglement, and Sparsity      
2024/06/10 no class (Pentecost week)        
  Part 4: Efficient LLMs        
2024/06/17 Efficient Inference (1) Fast inference from transformers via speculative decoding; (2) A Theoretical Perspective for Speculative Decoding Algorithm Blockwise Parallel Decoding for Deep Autoregressive Models; Looking back at Speculative Decoding    
2024/06/24 Wrap-up        

Requirement

Prerequisites. You are expected to have essential understanding of neural networks and have completed one course on natural language processing. In addition, you should be a curious and active learner to prepare your presentation and discuss.

Registration. If you would like to participate, you should directly register through PULS. In addition, please drop an email to meng.li (at) uni-potsdam.de until April 27 (23:59), 2024. In your email, please:

  • Tell me your name, semester, and major
  • Name your top-3 paper choices from the syllabus for presenting.
  • Explain why you want to take this course
  • List some of your related experience in deep learning/natural language processing/implementing NLP models

Format. We will focus on one topic each week.

  • In the first two units, I will provide basic introduction to LLMs.
  • From the fourth week, there are two readings for each topic and two students will present in each unit. Each student will present one paper and lead followed discussion or activities. The presentation should last 20-30 minutes and leave 15-25 minutes for discussion or activities. Students are expected to read both papers every week, and submit one question for each paper by Monday evening (23:59).
  • In the last week, we will share our experience of using LLMs, and discuss final projects.

Grading

Scheme. Questions about readings & Discussion: 20%; Presentation: 30%; Applications of LLMs: 10%; Final paper: 40%.

Questions. From the fourth week, each student submits one question for each paper by Monday evening (23:59) on Moodle. Questions are graded on a 3-point scale (0: no question submitted, 1: superficial question, 2: insightful question). (Superficial questions are questions that usually rephrase the content of papers, and their answers can be found in the papers. Insightful questions are inquisitive. These questions could identify information gap or logical fallacy in papers, and connect with old literature in a new perspective.)

Presentations.

  • Your presentation is expected to motivate the paper with meaningful questions in a broad context, and outline claims and their support. Not all details need to be included in a limited time.
  • Your presentation in this seminar is not supposed to be a perfect pitching. It will not affect presentation grade if you do not understand some points of the assigned paper. Rather, it is expected to be open and transparent. Confusing questions could be a starting point for in-class discussions and your research in the future.
  • Rehearse your presentation and improve it with feedbacks from your friends or fellow students if time permits. There are numerous books and videos on presentation, and feel free to learn and practice when necessary.

Applications of LLMs: Apply LLMs to improve your study, work or life, and write a document (1-2 pages). Students are exptected to present their cases in the last week, and share experiences.

Final paper.

Note: We will discuss this in the first meeting. Requirements may be changed based on popular demand.

The final paper should have 5 pages of main content (plus unlimited reference and appendix) following the ACL template.

  • Option 1: A technical report of a small independent project.
  • Option 2: A review paper.
  • Option 3: topic of your choice in discussion with me.

Both the proposal and the final paper should be uploaded on Moodle.

  • Proposal due date: June 15 (23:59), 2025
  • Final paper due date: October 5 (23:59), 2025

Contact

Please contact Meng at meng.li (at) uni-potsdam.de for any questions.