You don't truly understand a paper until you can
translate its math into step-by-step logic.
Pseudocode forces you to answer questions equations avoid:
| Math notation | Code construct | Example |
|---|---|---|
| ∑_{i∈I_u} | for loop over set | for i in user_items[u]: |
| argmin_θ L(θ) | optimiser.step() | optimizer.step() after loss.backward() |
| θ ← θ − α∇L | parameter update | param -= lr * param.grad |
| P ∈ ℝ^{m×d} | matrix initialisation | P = nn.Embedding(m, d) |
| p_u^⊤ q_i | dot product | torch.dot(p_u, q_i) |
| σ(x) | activation function | torch.sigmoid(x) |
| e_u^{(l+1)} = AGG(...) | message passing layer | h = conv_layer(h, edge_index) |
What data goes in? User IDs, item IDs, ratings, embeddings, adjacency matrices — list them all explicitly.
What does it return? A score, a ranked list, updated embeddings, a loss value?
What iterates over what? Epochs → batches → user-item pairs. Nested Σ = nested loop.
Which variables are being modified? Embeddings? Parameters? Gradient accumulators? Papers often describe the loss but skip describing what exactly gets updated and when.
| Piece | Meaning | In code |
|---|---|---|
| ∑_{(u,i,j)∈𝒟} | loop over training triples | for u, i, j in dataloader: |
| p_u^⊤q_i − p_u^⊤q_j | positive score minus negative score | score = dot(p[u],q[i]) - dot(p[u],q[j]) |
| ln σ(·) | log-sigmoid of score diff | F.logsigmoid(score) |
| −∑ ··· | negate for minimisation | loss = -logsigmoid.sum() |
| λ‖P‖²_F + λ‖Q‖²_F | L2 regularisation on embeddings | loss += λ * (P.norm()**2 + Q.norm()**2) |
Notice what the equation didn't say: P and Q are initialised randomly. Only the embeddings of u, i, j are updated per step (not the full matrices). j is sampled — the paper leaves out how.
Two equations. They encode: one propagation step + final aggregation.
What is every variable? What is its shape? Is it learned or computed?
Every Σ is a loop. What does it range over? What is accumulated?
Which variables change? What is the ← assignment? What triggers it?
State explicitly: what goes in, what comes out. If you can't — re-read the paper.
The test: hand your pseudocode to someone else. If they could implement it without reading the paper — you've done it right.
Every summation is a loop. Every nested Σ is a nested loop. Translate mechanically.
Initialisations, loop order, negative sampling strategy — equations assume you know these. You don't, until you ask.
If you can write pseudocode for a paper, you understand it. If you can't, you don't — regardless of how fluent the math looks.
Next: M4 · L2 — Spotting Missing Details