M4 · Lesson 1 — Reproducing Works

From Equations
to Pseudocode

You don't truly understand a paper until you can
translate its math into step-by-step logic.

01

M4 · L1 — Why This Matters

The gap between reading and doing

Understanding math ≠
knowing how to implement it

"I understood the equation. Then I tried to implement it and realised I didn't know what to initialise, which loop goes first, or what the actual input shape was."

Pseudocode forces you to answer questions equations avoid:

What are the inputs and outputs?
What is the loop structure?
What is the order of operations?
What needs to be initialised before training?

02

M4 · L1 — The Mapping

Math → Code vocabulary

Every math construct
maps to a code construct

Math notation	Code construct	Example
∑_{i∈I_u}	for loop over set	for i in user_items[u]:
argmin_θ L(θ)	optimiser.step()	optimizer.step() after loss.backward()
θ ← θ − α∇L	parameter update	param -= lr * param.grad
P ∈ ℝ^{m×d}	matrix initialisation	P = nn.Embedding(m, d)
p_u^⊤ q_i	dot product	torch.dot(p_u, q_i)
σ(x)	activation function	torch.sigmoid(x)
e_u^{(l+1)} = AGG(...)	message passing layer	h = conv_layer(h, edge_index)

03

M4 · L1 — Four Elements

What to identify before writing pseudocode

Four elements in
every algorithm

01

Inputs

What data goes in? User IDs, item IDs, ratings, embeddings, adjacency matrices — list them all explicitly.

02

Outputs

What does it return? A score, a ranked list, updated embeddings, a loss value?

03

Loop structure

What iterates over what? Epochs → batches → user-item pairs. Nested Σ = nested loop.

04 — most commonly missed

Updates — what changes each step?

Which variables are being modified? Embeddings? Parameters? Gradient accumulators? Papers often describe the loss but skip describing what exactly gets updated and when.

04

M4 · L1 — Worked Example

MF-BPR — the equation

Step 1: decode
what the math says

ℒ_BPR = −∑_{(u,i,j)∈𝒟} ln σ(p_u^⊤q_i − p_u^⊤q_j) + λ(‖P‖²_F + ‖Q‖²_F)

Piece	Meaning	In code
∑_{(u,i,j)∈𝒟}	loop over training triples	for u, i, j in dataloader:
p_u^⊤q_i − p_u^⊤q_j	positive score minus negative score	score = dot(p[u],q[i]) - dot(p[u],q[j])
ln σ(·)	log-sigmoid of score diff	F.logsigmoid(score)
−∑ ···	negate for minimisation	loss = -logsigmoid.sum()
λ‖P‖²_F + λ‖Q‖²_F	L2 regularisation on embeddings	loss += λ * (P.norm()2 + Q.norm()2)

05

M4 · L1 — Worked Example

MF-BPR — the pseudocode

Step 2: write
the algorithm

# Inputs: user set U, item set I, observed interactions O
# Hyperparams: d (dims), α (lr), λ (reg), epochs

INITIALISE P ∈ ℝ^{m×d}, Q ∈ ℝ^{n×d}  # random small values

FOR epoch = 1 to epochs:
  FOR each (u, i, j) in sample_triples(O):   # j is negative sample
    score = dot(P[u], Q[i]) - dot(P[u], Q[j])
    loss  = -log_sigmoid(score)
    loss += λ * (‖P[u]‖² + ‖Q[i]‖² + ‖Q[j]‖²)

    # Gradient step
    P[u] ← P[u] - α * ∂loss/∂P[u]
    Q[i] ← Q[i] - α * ∂loss/∂Q[i]
    Q[j] ← Q[j] - α * ∂loss/∂Q[j]

RETURN P, Q
      

Notice what the equation didn't say: P and Q are initialised randomly. Only the embeddings of u, i, j are updated per step (not the full matrices). j is sampled — the paper leaves out how.

06

M4 · L1 — Second Example

LightGCN forward pass

Equations → pseudocode
for graph propagation

e_u^{(l+1)} = ∑_{i∈𝒩_u} (1/√|𝒩_u|√|𝒩_i|) · e_i^{(l)}

e_u^* = (1/L+1) ∑_{l=0}^{L} e_u^{(l)}

Two equations. They encode: one propagation step + final aggregation.

# Forward pass
INPUT: E^(0) ∈ ℝ^{(m+n)×d}, graph A

all_embeddings = [E^(0)]

FOR l = 1 to L:
  # Normalised graph convolution
  E^(l) = D^{-1/2} · A · D^{-1/2} · E^(l-1)
  all_embeddings.append(E^(l))

# Average across all layers
E_final = mean(all_embeddings, axis=0)

# Split back into user / item
P = E_final[:m]
Q = E_final[m:]

RETURN P, Q
        

07

M4 · L1 — The Process

A repeatable workflow

Equation to pseudocode
in 4 steps

01

List all symbols

What is every variable? What is its shape? Is it learned or computed?

02

Find the loops

Every Σ is a loop. What does it range over? What is accumulated?

03

Identify updates

Which variables change? What is the ← assignment? What triggers it?

04

Write inputs/outputs

State explicitly: what goes in, what comes out. If you can't — re-read the paper.

The test: hand your pseudocode to someone else. If they could implement it without reading the paper — you've done it right.

08

M4 · L1 — Key Takeaways

What to remember

01

Σ = for loop

Every summation is a loop. Every nested Σ is a nested loop. Translate mechanically.

02

Equations hide context

Initialisations, loop order, negative sampling strategy — equations assume you know these. You don't, until you ask.

03

Pseudocode is understanding

If you can write pseudocode for a paper, you understand it. If you can't, you don't — regardless of how fluent the math looks.

Next: M4 · L2 — Spotting Missing Details

09

From Equationsto Pseudocode