M4 · Lesson 1 — Reproducing Works

From Equations
to Pseudocode

You don't truly understand a paper until you can
translate its math into step-by-step logic.

01
M4 · L1 — Why This Matters

The gap between reading and doing

Understanding math ≠
knowing how to implement it

"I understood the equation. Then I tried to implement it and realised I didn't know what to initialise, which loop goes first, or what the actual input shape was."

Pseudocode forces you to answer questions equations avoid:

  • What are the inputs and outputs?
  • What is the loop structure?
  • What is the order of operations?
  • What needs to be initialised before training?
02
M4 · L1 — The Mapping

Math → Code vocabulary

Every math construct
maps to a code construct

Math notationCode constructExample
∑_{i∈I_u}for loop over setfor i in user_items[u]:
argmin_θ L(θ)optimiser.step()optimizer.step() after loss.backward()
θ ← θ − α∇Lparameter updateparam -= lr * param.grad
P ∈ ℝ^{m×d}matrix initialisationP = nn.Embedding(m, d)
p_u^⊤ q_idot producttorch.dot(p_u, q_i)
σ(x)activation functiontorch.sigmoid(x)
e_u^{(l+1)} = AGG(...)message passing layerh = conv_layer(h, edge_index)
03
M4 · L1 — Four Elements

What to identify before writing pseudocode

Four elements in
every algorithm

01

Inputs

What data goes in? User IDs, item IDs, ratings, embeddings, adjacency matrices — list them all explicitly.

02

Outputs

What does it return? A score, a ranked list, updated embeddings, a loss value?

03

Loop structure

What iterates over what? Epochs → batches → user-item pairs. Nested Σ = nested loop.

04 — most commonly missed

Updates — what changes each step?

Which variables are being modified? Embeddings? Parameters? Gradient accumulators? Papers often describe the loss but skip describing what exactly gets updated and when.

04
M4 · L1 — Worked Example

MF-BPR — the equation

Step 1: decode
what the math says

ℒ_BPR = −∑_{(u,i,j)∈𝒟} ln σ(p_u^⊤q_i − p_u^⊤q_j) + λ(‖P‖²_F + ‖Q‖²_F)
PieceMeaningIn code
∑_{(u,i,j)∈𝒟}loop over training triplesfor u, i, j in dataloader:
p_u^⊤q_i − p_u^⊤q_jpositive score minus negative scorescore = dot(p[u],q[i]) - dot(p[u],q[j])
ln σ(·)log-sigmoid of score diffF.logsigmoid(score)
−∑ ···negate for minimisationloss = -logsigmoid.sum()
λ‖P‖²_F + λ‖Q‖²_FL2 regularisation on embeddingsloss += λ * (P.norm()**2 + Q.norm()**2)
05
M4 · L1 — Worked Example

MF-BPR — the pseudocode

Step 2: write
the algorithm

# Inputs: user set U, item set I, observed interactions O # Hyperparams: d (dims), α (lr), λ (reg), epochs INITIALISE P ∈ ℝ^{m×d}, Q ∈ ℝ^{n×d} # random small values FOR epoch = 1 to epochs: FOR each (u, i, j) in sample_triples(O): # j is negative sample score = dot(P[u], Q[i]) - dot(P[u], Q[j]) loss = -log_sigmoid(score) loss += λ * (‖P[u]‖² + ‖Q[i]‖² + ‖Q[j]‖²) # Gradient step P[u] ← P[u] - α * ∂loss/∂P[u] Q[i] ← Q[i] - α * ∂loss/∂Q[i] Q[j] ← Q[j] - α * ∂loss/∂Q[j] RETURN P, Q

Notice what the equation didn't say: P and Q are initialised randomly. Only the embeddings of u, i, j are updated per step (not the full matrices). j is sampled — the paper leaves out how.

06
M4 · L1 — Second Example

LightGCN forward pass

Equations → pseudocode
for graph propagation

e_u^{(l+1)} = ∑_{i∈𝒩_u} (1/√|𝒩_u|√|𝒩_i|) · e_i^{(l)}

e_u^* = (1/L+1) ∑_{l=0}^{L} e_u^{(l)}

Two equations. They encode: one propagation step + final aggregation.

# Forward pass INPUT: E^(0) ∈ ℝ^{(m+n)×d}, graph A all_embeddings = [E^(0)] FOR l = 1 to L: # Normalised graph convolution E^(l) = D^{-1/2} · A · D^{-1/2} · E^(l-1) all_embeddings.append(E^(l)) # Average across all layers E_final = mean(all_embeddings, axis=0) # Split back into user / item P = E_final[:m] Q = E_final[m:] RETURN P, Q
07
M4 · L1 — The Process

A repeatable workflow

Equation to pseudocode
in 4 steps

01

List all symbols

What is every variable? What is its shape? Is it learned or computed?

02

Find the loops

Every Σ is a loop. What does it range over? What is accumulated?

03

Identify updates

Which variables change? What is the ← assignment? What triggers it?

04

Write inputs/outputs

State explicitly: what goes in, what comes out. If you can't — re-read the paper.

The test: hand your pseudocode to someone else. If they could implement it without reading the paper — you've done it right.

08
M4 · L1 — Key Takeaways

What to remember

01

Σ = for loop

Every summation is a loop. Every nested Σ is a nested loop. Translate mechanically.

02

Equations hide context

Initialisations, loop order, negative sampling strategy — equations assume you know these. You don't, until you ask.

03

Pseudocode is understanding

If you can write pseudocode for a paper, you understand it. If you can't, you don't — regardless of how fluent the math looks.

Next: M4 · L2 — Spotting Missing Details

09
← → arrow keys to navigate