We move beyond structured tabular data into the world of recommendation systems — one of the most commercially impactful applications of deep learning.
Embeddings — turning discrete categories into dense numeric vectors. This is the key idea we build on today.
Collaborative Filtering — a domain where embeddings are the entire model, not just a preprocessing step.
A recommender system predicts a user's preference or rating for items they have not yet encountered.
Collaborative filtering starts with a user–item rating matrix. Each cell is a known rating; most cells are unknown.
| User \ Movie | Inception | Toy Story | The Matrix | Frozen | Interstellar |
|---|---|---|---|---|---|
| Alice | 5 | ? | 4 | ? | 5 |
| Bob | ? | 5 | ? | 4 | ? |
| Carol | 3 | 4 | ? | 5 | 2 |
| Dave | ? | ? | 5 | ? | 4 |
Given the known ratings (numbers), predict the missing ones (?). This is a matrix completion problem.
Alice and Dave both love Sci-Fi (Inception, The Matrix, Interstellar). So Alice's rating for The Matrix is a good signal for Dave's missing ratings too.
In reality the matrix is extremely sparse. Netflix has ~200M users and ~15,000 titles. Each user rates a tiny fraction.
Even sparse data contains strong patterns. Users who agree on rated items tend to agree on unrated ones too.
We decompose the rating matrix R into two smaller matrices: one for users, one for items.
A vector of k numbers that encodes Alice's taste profile across k hidden dimensions.
A vector of k numbers encoding how much a movie "contains" each hidden dimension.
We never label the latent dimensions — the model discovers them automatically. But after training we can interpret what they seem to represent.
Each user gets a k-dimensional vector. Each item gets a k-dimensional vector. A user who scores high on "Sci-Fi" will be matched with movies that also score high on "Sci-Fi" — even though neither label was ever given to the model.
To predict how much user u will like item i, we compute the dot product of their embedding vectors.
User and item point in similar directions — they share common "tastes". High predicted rating.
User and item point in different directions — mismatched preferences. Low predicted rating.
The embedding values are just learnable parameters — exactly like weights in a neural network.
If there are n users, m items and k factors:
n×k + m×k parameters total
e.g. 1000 users, 500 items, k=50 → 75,000 params
Some users always rate high (generous). Some movies always rate low (niche). We capture this with bias terms.
Without bias, a generous user would need abnormally large embedding values just to account for their rating habit — confusing the latent factor signal.
collab_learner(dls, n_factors=50, y_range=(0,5.5), use_nn=False) includes bias by default.
FastAI includes the MovieLens 100K dataset — 100,000 ratings from 943 users on 1,682 movies. Perfect for in-class training.
from fastai.collab import * from fastai.tabular.all import * # Load the MovieLens dataset (built-in, no internet needed) path = untar_data(URLs.ML_SAMPLE) # small 100-sample version # OR for the full dataset: path = untar_data(URLs.ML_100k) # The ratings file: userId, movieId, rating, timestamp ratings = pd.read_csv(path/'u.data', delimiter='\t', header=None, names=['user', 'movie', 'rating', 'timestamp']) ratings.head()
URLs.ML_SAMPLE is the safest option for in-class use — it downloads quickly and trains in seconds.
# CollabDataLoaders handles the embedding index lookup for you dls = CollabDataLoaders.from_df( ratings, user_name='user', # column of user IDs item_name='movie', # column of item IDs rating_name='rating', # column of ratings (target) valid_pct=0.2, # 20% validation split seed=42 ) dls.show_batch() # preview: user, movie, rating rows
# Create a matrix factorisation learner (dot-product model) learn = collab_learner( dls, n_factors=50, # k = 50 latent dimensions y_range=(0, 5.5), # clamp predictions to valid rating range wd=0.1 # weight decay (L2 regularisation) ) # Find a good learning rate learn.lr_find() # Train with 1-cycle policy learn.fit_one_cycle(5, 5e-3, wd=0.1)
Applies a sigmoid scaled to (0, 5.5). This prevents predicting ratings below 0 or far above 5. We use 5.5 not 5.0 to avoid the sigmoid saturating at the boundary.
Prevents embedding values from growing too large — a key regularisation technique for CF. Without it, the model can memorise training ratings.
n_factors in collab_learner?y_range=(0, 5.5) rather than (0, 5.0) when ratings go from 0 to 5?After training, we can extract and analyse the embedding vectors to understand what the model has learned.
# Extract item (movie) embeddings movie_emb = learn.model.i_weight.weight # shape: (n_movies, n_factors) # Find the most "extreme" movies on the first principal component movie_pca = movie_emb.pca(3) # reduce to 3D for visualisation fac0, fac1, fac2 = movie_pca.t() # Plot: movies with highest / lowest values on factor 0 idxs = fac0.argsort() [learn.dls.classes['title'][i] for i in idxs[:5]] # lowest [learn.dls.classes['title'][i] for i in idxs[-5:]] # highest
Instead of a simple dot product, we can concatenate the embeddings and pass them through a neural network.
# Switch to neural CF with use_nn=True learn_nn = collab_learner(dls, use_nn=True, emb_szs={'userId': 50, 'movieId': 50}, layers=[128, 64], # MLP hidden layer sizes y_range=(0, 5.5))
The dot product model is simpler and often competitive. Neural CF adds capacity to learn non-linear interaction patterns, but needs more data and careful tuning.
A brand-new user has no ratings. Their embedding is random (or zeroed). The model cannot make good predictions yet.
Solutions: ask for initial ratings ("onboarding"), use demographic fallback, or use popular-item heuristics.
A newly added movie has no ratings. No embedding can be learned. The model will never recommend it — even if it is excellent.
Solutions: use content features to bootstrap, or hybrid content + CF approaches.
Recommender systems operate at massive scale — their design decisions affect millions of people.
Optimising for clicks or watch-time is not the same as optimising for user wellbeing. Models can learn to exploit psychological biases (outrage, fear) because they drive engagement metrics.
CF requires storing detailed behavioural data about every user. Even "anonymous" IDs can be re-identified from rating patterns.
Always document what your recommendation objective optimises, and what harms may result from that choice.
CollabDataLoaders.from_df()collab_learner()learn.model.i_weight.weightuse_nn=True for neural CFYou will train a CF model on MovieLens, inspect the learned embeddings, and compare dot-product vs neural CF performance.
Convolutional Neural Networks — how spatial structure in images is captured by learned filters.