Every RS loss function ends with λ‖θ‖². Learn what norms measure
and why regularisation is not optional.
For vector x = [x₁, x₂, ..., x_d], three norms matter in RS:
| Norm | Shape | Used in RS for |
|---|---|---|
| ‖x‖₂ | Vector | L2 regularisation: λ‖p_u‖₂² |
| ‖x‖₁ | Vector | Sparse regularisation (less common) |
| ‖M‖_F | Matrix | Regularising full embedding matrix P or Q |
| ‖x‖ (no subscript) | Vector | Almost always means L2 in RS context |
No subscript = L2. When you see ‖·‖ without a subscript in an RS paper, assume L2 norm unless stated otherwise.
Multiply corresponding dimensions, sum up. Higher value = more aligned. Used in MF, LightGCN, BPR.
p_u^⊤ q_i and p_u · q_i are the same thing — transpose notation and dot notation are interchangeable for vectors.
Dot product divided by both vectors' lengths. Values ∈ [−1, 1]. Measures direction only, not magnitude.
Dot product rewards alignment AND magnitude — so larger embeddings score higher. Cosine rewards alignment only. RS typically uses dot product because embedding magnitude can encode how active a user is.
Without regularisation, a model can set embedding values arbitrarily large to fit training data perfectly — then fail completely on new users. This is overfitting.
Why squared norm? ‖θ‖² is differentiable everywhere. Plain ‖θ‖ has a kink at zero. The squared version gives cleaner gradients for gradient descent.
| λ value | Effect |
|---|---|
| λ = 0 | No regularisation → model overfits to training data |
| λ very small | Weak regularisation → can still overfit |
| λ = 0.001 | Typical good value in RS (tune via validation) |
| λ very large | All embeddings pushed to zero → model underfits |
MF version: regularise both user matrix P and item matrix Q independently.
Squared = sum of squared components. For matrices → Frobenius norm. Default assumption in RS unless stated.
p_u^⊤ q_i = ∑ p_{uk}·q_{ik}. Cosine removes magnitude effect. RS models typically prefer dot product.
λ=0 → overfit. λ too large → underfit. Always tune λ on a validation set. Squared norm because it's differentiable.
Next: M1 · L5 — Calculus Notation in ML/RS