Week 8 · Lesson 7

Collaborative Filtering

Teaching machines to predict what you will like next

We move beyond structured tabular data into the world of recommendation systems — one of the most commercially impactful applications of deep learning.

CP3501 – Deep Learning James Cook University Semester 1, 2025

Recap: Where We Have Been

Weeks 1–6

Transfer learning with images (ResNet, FastAI)
ML fundamentals — loss, metrics, overfitting
Gradient descent — the engine of learning
NLP with Transformers (Hugging Face)
Tabular deep learning (Titanic)

Week 7

Embeddings — turning discrete categories into dense numeric vectors. This is the key idea we build on today.

The common thread

Build It → Understand It → Apply It

We always start with working code
Then we open the hood
Then we reason about design choices

Today

Collaborative Filtering — a domain where embeddings are the entire model, not just a preprocessing step.

What is a Recommender System?

A recommender system predicts a user's preference or rating for items they have not yet encountered.

Real-world examples

Netflix — "Because you watched..."
Spotify — Discover Weekly playlist
Amazon — "Customers also bought"
YouTube — next video autoplay

Two main families

Content-based — use item features (genre, director…)
Collaborative Filtering — use patterns across many users
No item features needed — just the rating data

The Rating Matrix

Collaborative filtering starts with a user–item rating matrix. Each cell is a known rating; most cells are unknown.

User \ Movie	Inception	Toy Story	The Matrix	Frozen	Interstellar
Alice	5	?	4	?	5
Bob	?	5	?	4	?
Carol	3	4	?	5	2
Dave	?	?	5	?	4

The core problem

Given the known ratings (numbers), predict the missing ones (?). This is a matrix completion problem.

Key insight

Alice and Dave both love Sci-Fi (Inception, The Matrix, Interstellar). So Alice's rating for The Matrix is a good signal for Dave's missing ratings too.

The Sparsity Problem

In reality the matrix is extremely sparse. Netflix has ~200M users and ~15,000 titles. Each user rates a tiny fraction.

Sparsity example

200,000,000 users × 15,000 titles = 3 trillion possible ratings
A user who rates 50 movies fills only 0.00033% of their row
Over 99.99% of the matrix is empty!

Why CF still works

Even sparse data contains strong patterns. Users who agree on rated items tend to agree on unrated ones too.

Check Your Understanding

Knowledge Check 1

Collaborative Filtering makes predictions based on:

Try again

Knowledge Check 2

In a user–item rating matrix, most cells are empty. What does each empty cell represent?

Try again

Matrix Factorisation: The Core Idea

We decompose the rating matrix R into two smaller matrices: one for users, one for items.

User embedding row

A vector of k numbers that encodes Alice's taste profile across k hidden dimensions.

Item embedding column

A vector of k numbers encoding how much a movie "contains" each hidden dimension.

Latent Factors: What Do They Capture?

We never label the latent dimensions — the model discovers them automatically. But after training we can interpret what they seem to represent.

The magic of latent factors

Each user gets a k-dimensional vector. Each item gets a k-dimensional vector. A user who scores high on "Sci-Fi" will be matched with movies that also score high on "Sci-Fi" — even though neither label was ever given to the model.

Predicting Ratings: The Dot Product

To predict how much user u will like item i, we compute the dot product of their embedding vectors.

r̂_ui = u_u · v_i = Σ_k u_uk × v_ik

High dot product

User and item point in similar directions — they share common "tastes". High predicted rating.

Low (or negative) dot product

User and item point in different directions — mismatched preferences. Low predicted rating.

Learning Embeddings via Gradient Descent

The embedding values are just learnable parameters — exactly like weights in a neural network.

Training loop

Forward: look up user embedding + item embedding → dot product → predicted rating
Loss: compare prediction to actual rating (MSE)
Backward: compute gradients w.r.t. embedding values
Update: nudge embeddings to reduce loss

Loss = MSE = (1/N) Σ (r_ui − r̂_ui)²

What gets learned?

All user embedding vectors (U matrix)
All item embedding vectors (V matrix)
No other model weights — this is the whole model

Number of parameters

If there are n users, m items and k factors:

n×k + m×k parameters total

e.g. 1000 users, 500 items, k=50 → 75,000 params

Adding Bias Terms

Some users always rate high (generous). Some movies always rate low (niche). We capture this with bias terms.

r̂_ui = u_u · v_i + b_u + b_i

User bias b_u

Alice tends to give 4–5 stars → high b_u
Bob tends to give 1–2 stars → low b_u
Captures overall generosity / harshness

Item bias b_i

The Godfather gets high ratings from everyone → high b_i
A niche documentary gets lower average ratings → lower b_i

Why bias matters

Without bias, a generous user would need abnormally large embedding values just to account for their rating habit — confusing the latent factor signal.

FastAI handles this automatically

collab_learner(dls, n_factors=50, y_range=(0,5.5), use_nn=False) includes bias by default.

FastAI: Loading the MovieLens Dataset

FastAI includes the MovieLens 100K dataset — 100,000 ratings from 943 users on 1,682 movies. Perfect for in-class training.

from fastai.collab import *
from fastai.tabular.all import *

# Load the MovieLens dataset (built-in, no internet needed)
path = untar_data(URLs.ML_SAMPLE)   # small 100-sample version
# OR for the full dataset:
path = untar_data(URLs.ML_100k)

# The ratings file: userId, movieId, rating, timestamp
ratings = pd.read_csv(path/'u.data', delimiter='\t',
                      header=None,
                      names=['user', 'movie', 'rating', 'timestamp'])
ratings.head()

What we need

A column of user IDs
A column of item IDs
A column of ratings (our target)
That is all CF needs — no item features!

Note on Colab

URLs.ML_SAMPLE is the safest option for in-class use — it downloads quickly and trains in seconds.

FastAI: Building the DataLoader

# CollabDataLoaders handles the embedding index lookup for you
dls = CollabDataLoaders.from_df(
    ratings,
    user_name='user',        # column of user IDs
    item_name='movie',       # column of item IDs
    rating_name='rating',    # column of ratings (target)
    valid_pct=0.2,           # 20% validation split
    seed=42
)

dls.show_batch()   # preview: user, movie, rating rows

What CollabDataLoaders does internally

Assigns a contiguous integer index to each unique user ID
Assigns a contiguous integer index to each unique movie ID
These indices are used to look up the embedding row for each user/item
The original IDs (e.g. userId=874) are not used directly — the index is

Embedding lookup: E[index] → returns the k-dimensional vector for that user/item

FastAI: Training the Collaborative Filter

# Create a matrix factorisation learner (dot-product model)
learn = collab_learner(
    dls,
    n_factors=50,        # k = 50 latent dimensions
    y_range=(0, 5.5),    # clamp predictions to valid rating range
    wd=0.1              # weight decay (L2 regularisation)
)

# Find a good learning rate
learn.lr_find()

# Train with 1-cycle policy
learn.fit_one_cycle(5, 5e-3, wd=0.1)

y_range explained

Applies a sigmoid scaled to (0, 5.5). This prevents predicting ratings below 0 or far above 5. We use 5.5 not 5.0 to avoid the sigmoid saturating at the boundary.

Weight decay

Prevents embedding values from growing too large — a key regularisation technique for CF. Without it, the model can memorise training ratings.

Check Your Understanding

Knowledge Check 3

What is n_factors in collab_learner?

Try again

Knowledge Check 4

Why do we set y_range=(0, 5.5) rather than (0, 5.0) when ratings go from 0 to 5?

Try again

Interpreting Learned Embeddings

After training, we can extract and analyse the embedding vectors to understand what the model has learned.

# Extract item (movie) embeddings
movie_emb = learn.model.i_weight.weight   # shape: (n_movies, n_factors)

# Find the most "extreme" movies on the first principal component
movie_pca = movie_emb.pca(3)    # reduce to 3D for visualisation
fac0, fac1, fac2 = movie_pca.t()

# Plot: movies with highest / lowest values on factor 0
idxs = fac0.argsort()
[learn.dls.classes['title'][i] for i in idxs[:5]]     # lowest
[learn.dls.classes['title'][i] for i in idxs[-5:]]    # highest

What you typically find

Factor extremes often align with recognisable genres or quality levels
Movies close together in embedding space tend to be recommended interchangeably
This is latent factor discovery — no labels were ever given

Neural Collaborative Filtering

Instead of a simple dot product, we can concatenate the embeddings and pass them through a neural network.

# Switch to neural CF with use_nn=True
learn_nn = collab_learner(dls, use_nn=True,
                          emb_szs={'userId': 50, 'movieId': 50},
                          layers=[128, 64],  # MLP hidden layer sizes
                          y_range=(0, 5.5))

Dot product vs neural CF

The dot product model is simpler and often competitive. Neural CF adds capacity to learn non-linear interaction patterns, but needs more data and careful tuning.

Limitations: The Cold Start Problem

New user cold start

A brand-new user has no ratings. Their embedding is random (or zeroed). The model cannot make good predictions yet.

Solutions: ask for initial ratings ("onboarding"), use demographic fallback, or use popular-item heuristics.

New item cold start

A newly added movie has no ratings. No embedding can be learned. The model will never recommend it — even if it is excellent.

Solutions: use content features to bootstrap, or hybrid content + CF approaches.

Ethics in Recommender Systems

Recommender systems operate at massive scale — their design decisions affect millions of people.

Filter bubbles

CF amplifies existing preferences — users see more of what they already like
Can reinforce narrow worldviews, limit exposure to diverse content
Related: news recommendation and political polarisation

Popularity bias

Popular items accumulate more ratings → better embeddings → more recommendations
Niche content is systematically under-recommended
Disadvantages new creators and minority-interest content

Engagement vs wellbeing

Optimising for clicks or watch-time is not the same as optimising for user wellbeing. Models can learn to exploit psychological biases (outrage, fear) because they drive engagement metrics.

Privacy

CF requires storing detailed behavioural data about every user. Even "anonymous" IDs can be re-identified from rating patterns.

SLO4 link

Always document what your recommendation objective optimises, and what harms may result from that choice.

Week 8 Summary

Key concepts today

Collaborative Filtering — predict ratings from shared user-item patterns
Rating matrix — sparse, most values unknown
Matrix factorisation — decompose into user × item embeddings
Latent factors — hidden dimensions learned from data
Dot product — similarity between user and item embeddings
Bias terms — capture global generosity / quality effects
Neural CF — replace dot product with an MLP
Cold start — the fundamental limitation of CF

FastAI tools used

CollabDataLoaders.from_df()
collab_learner()
learn.model.i_weight.weight
use_nn=True for neural CF

Next up — Practical Workshop

You will train a CF model on MovieLens, inspect the learned embeddings, and compare dot-product vs neural CF performance.

Coming up — Week 9

Convolutional Neural Networks — how spatial structure in images is captured by learned filters.

Collaborative Filtering

Recap: Where We Have Been

Weeks 1–6

The common thread

What is a Recommender System?

Real-world examples

Two main families

The Rating Matrix

The Sparsity Problem

Check Your Understanding

Matrix Factorisation: The Core Idea

Latent Factors: What Do They Capture?

Predicting Ratings: The Dot Product

Learning Embeddings via Gradient Descent

Training loop

Adding Bias Terms

User bias bu

Item bias bi

FastAI: Loading the MovieLens Dataset

FastAI: Building the DataLoader

FastAI: Training the Collaborative Filter

Check Your Understanding

Interpreting Learned Embeddings

Neural Collaborative Filtering

Limitations: The Cold Start Problem

Ethics in Recommender Systems

Filter bubbles

Popularity bias

Week 8 Summary

Key concepts today

FastAI tools used

User bias b_u

Item bias b_i