Loss functions, evaluation metrics, training objectives — all built from summations.
Learn to read Σ as a for-loop in disguise.
But in RS papers, the limits are almost never 1 to n. They're sets:
The mental translation: Every Σ is a loop. Every nested Σ is a nested loop. Translate mechanically — it always works.
| Notation | Read As | RS Meaning | Example Use |
|---|---|---|---|
| 𝒪 | "observed set" | All known (u,i) interaction pairs | (u,i) ∈ 𝒪 means we know r_{ui} |
| 𝒩_u | "neighbourhood of u" | Items user u has interacted with | i ∈ 𝒩_u → u rated i |
| ℐ⁺_u | "positive items of u" | Items u has interacted with (same as 𝒩_u) | {i ∈ ℐ : r_{ui} > 0} |
| ℐ⁻_u | "negative items of u" | Items u has NOT interacted with | Sampled as negatives in BPR |
| 𝒟 | "training data" | Set of training triples (u,i,j) | (u,i,j) ∈ 𝒟 in BPR loss |
| |𝒰| | "cardinality of U" | Total number of users | |𝒰| = n = 6,038 |
| {x | condition} | "set of x such that" | Filter a set by a condition | {i ∈ ℐ | r_{ui} > 0} = u's positive items |
|ℐ⁺_u| = the count of u's positive items (cardinality). The outer | · | bars mean "how many", not "such that".
In RS, Π appears in likelihood functions — the probability of observing all ratings assuming independence:
"The probability of observing all ratings, given parameters θ."
Why we immediately take log:
Numerical stability — products of small probabilities underflow to zero
Same optimum — log is monotone so argmax doesn't change
Clean gradients — d/dx ln(x) = 1/x
Every summation is a loop. Nested Σ = nested loop. Translate mechanically. Always works.
{i ∈ ℐ | r_{ui} > 0} = filter operation. | outside |·| = cardinality (count). Context distinguishes them.
Product of probabilities always becomes sum of log-probabilities. This is a reflex move in ML derivations.
Next: M1 · L4 — Norms, Distances & Regularisation