Lectures - SIT330-770 - Natural Language Processing / Trimester 1, 2025

Important notes:

We will upload lectures prior to their corresponding classes.
[SIT770]: Indicates that the content provided is specifically tailored for students currently enrolled in SIT770.

Week 0: Course Overview
Summary: Introduction and course overview.
[slides] [slides 6up]
Video recordings (23 Minutes and 19 Seconds):
- Welcome to SIT330-770 Natural Language Processing (1:37)
- Course Overview (24:01)
Week 1: Information Retrieval Part 1
Summary: Inverted indices, scoring, term weighting, and the vector space model.
[slides] [slides 6up]
Video recordings (1 Hour, 59 Minutes and 23 Seconds):
Week 2: Information Retrieval Part 2
Summary: Probabilistic IR and Evaluation methods.
[slides] [slides 6up]
Video recordings (1 Hour, 59 Minutes and 22 Seconds):
- Probabilistic IR model (1 Hour, 08 Minutes and 39 Seconds):
  
  Probabilistic retrieval model (7:10)
  
  The Probability Ranking Principle (PRP) (9:47)
  
  The Binary Independence Model (BIM) (11:39)
  
  The BIM Ranking formula (10:33)
  
  BIM Ranking Example (5:38)
  
  Improving the BIM ranking (9:01)
  
  The BM (Best Match) Models (10:34)
  
  The BM25 Model (4:17)
- IR Evaluation methods (50 Minutes and 43 Seconds):
  
  Evaluating search engines (7:54)
  
  Boolean Evaluating Metrics (10:59)
  
  Ranked evaluation metrics (15:51)
  
  Test collection for IR evaluation (7:58)
  
  Results presentation (8:01)
Week 3: Text processing
Summary: Regular Expressions, Text Normalization, Edit Distance.
[slides] [slides 6up]
Video recordings (1 Hour, 54 Minutes and 32 Seconds):
- Regular Expressions (28 Minutes and 32 Seconds):
  
  Regular Expressions (17:44)
  
  More Regular Expressions: Substitutions and ELIZA (10:48)
- Text Normalization (37 Minutes and 11 Seconds):
  
  Words and Corpora (7:39)
  
  Word tokenization (10:31)
  
  Byte Pair Encoding (11:09)
  
  Word Normalization and other issues (7:52)
- [SIT770] Edit Distance (48 Minutes and 49 Seconds):
  
  Definition of Minimum Edit Distance (10:57)
  
  Computing Minimum Edit Distance (13:32)
  
  Backtrace for Computing Alignments (7:39)
  
  Weighted Minimum Edit Distance (4:20)
  
  Minimum Edit Distance in Computational Biology (12:21)
Week 4: N-gram Language Models
Summary: N-gram Language Models.
[slides] [slides 6up]
Video recordings (2 Hours, 01 Minutes and 23 Seconds):
- Language Models (1 Hour, 14 Minutes and 31 Seconds):
  
  Introduction to N-grams (13:40)
  
  Estimating N-gram Probabilities (8:55)
  
  Evaluation and Perplexity (13:08)
  
  Sampling and Generalization (11:30)
  
  Smoothing: Add-one (Laplace) smoothing (8:10)
  
  Interpolation, Backoff, and Web-Scale LMs (10:10)
  
  Kneser-Ney Smoothing (8:58)
- Spelling Correction and the Noisy Channel (46 Minutes and 52 Seconds):
  
  The Spelling Correction Task (7:06)
  
  The Noisy Channel Model of Spelling (23:43)
  
  Real-word spelling errors (8:33)
  
  State-of-the-art noisy systems (7:30)
Week 5: Vector Embeddings and Sequence Labeling
Summary: Vector Embeddings and Sequence Labeling.
[slides] [slides 6up]
Video recordings (2 Hour, 23 Minutes and 58 Seconds):
- Vector Embeddings (1 Hour, 12 Minutes and 00 Seconds):
  
  Word Meaning (16:13)
  
  Vector Semantics (10:37)
  
  Words and Vectors: BOW (7:07)
  
  Word2Vec (14:46)
  
  Word2vec: Learning the embeddings (11:32)
  
  Word Embedding vs. Bag of Words (4:03)
  
  Properties of Embeddings (7:42)
  
  Optional:
  
  Word embeddings LSA, Word2Vec, Glove, ELMo
- Sequence Labeling (1 Hour, 11 Minutes and 58 Seconds):
  
  English Word Classes (11:38)
  
  Part of Speech Tagging (11:07)
  
  Named Entity Recognition (NER) (12:29)
  
  Hidden Markov Model (HMM) Part-of-Speech Tagging (11:17)
  
  The components of an HMM tagger (9:40)
  
  HMM tagging as decoding (9:50)
  
  The Viterbi Algorithm (20:51)
Week 6: Neural Networks for NLP
Summary: Neural Networks for NLP.
[slides] [slides 6up] [Notes]
Video recordings:
- Introduction (1:01)
- Optional: Introduction to Neural Nets (1 Hour, 13 Minutes and 06 Seconds):
  
  Neural Networks Overview (4:26)
  
  Neural Network Representation (5:14)
  
  Computing a Neural Network’s Output (9:57)
  
  Vectorizing across multiple examples (9:05)
  
  Explanation for Vectorized Implementation (7:37)
  
  Activation functions (10:56)
  
  Derivatives of activation functions (7:57)
  
  Gradient descent for Neural Networks (9:57)
  
  Random Initialization (7:57)
  
  Applying feedforward networks to NLP tasks (15:32)
- Recurrent Neural Networks (2 Hours, 34 Minutes and 11 Seconds):
  
  Why sequence models? (3:00)
  
  Notation (9:15)
  
  Recurrent Neural Network Model (16:31)
  
  Different types of RNNs (8:33)
  
  Language model and sequence generation (12:01)
  
  Sampling novel sequences (8:38)
  
  Vanishing gradients with RNNs (6:28)
  
  Gated Recurrent Unit (GRU) (17:06)
  
  Long Short Term Memory (LSTM) (9:53)
  
  Bidirectional RNN (8:19)
  
  Deep RNNs (5:16)
  
  Basic Models (6:18)
  
  Picking the most likely sentence (8:56)
  
  Beam Search (11:54)
  
  Attention Model Intuition (9:41)
  
  Attention Model (12:22)
Week 7: Transformers and Pretrained LMs
Summary: Transformers and Pretrained LMs.
[slides] [slides 6up]
Video recordings (2 Hours, 05 Minutes and 12 Seconds):
- Transformers: Attention Is All You Need! (1 Hour, 10 Minutes and 27 Seconds)
  
  Introduction to Transformers (16:21)
  
  Self-Attention Mechanism (18:10)
  
  The Encoder Transformer Block (10:27)
  
  The Input: Embeddings for Tokens (8:03)
  
  The Input: Embeddings for Positions (11:15)
  
  The Task Specific Head (6:11)
- Pre-trained LMs (54 Minutes and 45 Seconds)
  
  BERT: Bidirectional Encoder Representations from Transformers (13:34)
  
  BERT pre-training (13:39)
  
  BERT fine-tuning (9:16)
  
  BERT Performance (6:11)
  
  Other Models Based on Transformers (6:43)
  
  HuggingFace (5:22)
Week 8: Large Language Models
Summary: Large Language Models.
[slides] [slides 6up]
Video recordings (1 Hours, 08 Minutes and 32 Seconds):
Week 9: Speech Processing & ASR
Summary: Speech Processing & ASR.
[slides] [slides 6up]
Video recordings (48 Minutes and 36 Seconds):
Week 10: Dialogue Systems & Conversational AI
Summary: Dialogue Systems & Conversational AI.
[slides] [slides 6up]
Video recordings (1 hours, 40 Minutes and 48 Seconds):