M1 · Lesson 3 — Math Notation Literacy

Summation, Product
& Set Notation

Loss functions, evaluation metrics, training objectives — all built from summations.
Learn to read Σ as a for-loop in disguise.

01
M1 · L3 — Summation

The most used operator in RS math

Σ is a for-loop —
read it that way

\[ \sum_{i=1}^{n} x_i = x_1 + x_2 + \cdots + x_n \]

But in RS papers, the limits are almost never 1 to n. They're sets:

\[ \sum_{u \in \mathcal{U}} \sum_{i \in \mathcal{I}_u} r_{ui} \]
Plain English
For each user u in the user set, for each item i that u has rated, accumulate the rating value.
# Nested Σ = nested for-loop # ∑_{u∈U} ∑_{i∈I_u} r_{ui} for u in U: for i in I_u: # items u rated accumulate(r[u][i])

The mental translation: Every Σ is a loop. Every nested Σ is a nested loop. Translate mechanically — it always works.

02
M1 · L3 — Set Notation

The RS set vocabulary

Sets you'll see in
every RS paper

NotationRead AsRS MeaningExample Use
𝒪"observed set"All known (u,i) interaction pairs(u,i) ∈ 𝒪 means we know r_{ui}
𝒩_u"neighbourhood of u"Items user u has interacted withi ∈ 𝒩_u → u rated i
ℐ⁺_u"positive items of u"Items u has interacted with (same as 𝒩_u){i ∈ ℐ : r_{ui} > 0}
ℐ⁻_u"negative items of u"Items u has NOT interacted withSampled as negatives in BPR
𝒟"training data"Set of training triples (u,i,j)(u,i,j) ∈ 𝒟 in BPR loss
|𝒰|"cardinality of U"Total number of users|𝒰| = n = 6,038
{x | condition}"set of x such that"Filter a set by a condition{i ∈ ℐ | r_{ui} > 0} = u's positive items
03
M1 · L3 — Set Builder Notation

Reading the filter operator

The vertical bar | means
"such that" — it filters

\[ \{i \in \mathcal{I}\ |\ r_{ui} > 0\} \]
Reading it
{
Start of a set definition
i ∈ ℐ
Consider all items i from the item set ℐ
|
"such that" — filter by the condition that follows
r_{ui} > 0
Only keep items where u's rating is positive
Plain English
The set of all items that user u has interacted with (positively rated).
# {i ∈ I | r_{ui} > 0} in Python pos_items_u = [ i for i in I if ratings[u][i] > 0 ] # Or as a set comprehension: # {i | r_{ui} > 0} = set(pos_items_u)

|ℐ⁺_u| = the count of u's positive items (cardinality). The outer | · | bars mean "how many", not "such that".

04
M1 · L3 — Product Notation

When you see Π — think multiplication loop

Product notation and
why we always take log

\[ \prod_{i=1}^n x_i = x_1 imes x_2 imes \cdots imes x_n \]

In RS, Π appears in likelihood functions — the probability of observing all ratings assuming independence:

\[ \mathcal{L} = \prod_{(u,i)\in\mathcal{O}} P(r_{ui} \mid heta) \]

"The probability of observing all ratings, given parameters θ."

Why we immediately take log:

\[ \ln \prod P_i = \sum \ln P_i \]
Reason 1

Numerical stability — products of small probabilities underflow to zero

Reason 2

Same optimum — log is monotone so argmax doesn't change

Reason 3

Clean gradients — d/dx ln(x) = 1/x

05
M1 · L3 — BPR Loss Decoded

Full worked example

Decoding the BPR loss
from scratch

\[ \mathcal{L}_{BPR} = -\sum_{(u,i,j)\in\mathcal{D}} \ln\sigma(\hat{y}_{ui} - \hat{y}_{uj}) + \lambda\| heta\|^2 \]
Every symbol decoded
∑_{(u,i,j)∈𝒟}
Loop over all (user, positive item, negative item) triples in training data
ŷ_{ui} − ŷ_{uj}
Difference between model score for positive item and negative item
σ(·)
Sigmoid: converts the score difference to a probability ∈ (0,1)
ln σ(·)
Log-probability — we maximise this (equivalent to minimising its negative)
−∑ ···
Negative sign: we minimise loss, so flip the maximisation
λ‖θ‖²
L2 regularisation: penalise large parameter values
Plain English
For every training triple, penalise the model when it scores a negative item higher than a positive item. Also penalise large parameter values to prevent overfitting.
06
M1 · L3 — Key Takeaways

What to remember

01

Σ = for-loop

Every summation is a loop. Nested Σ = nested loop. Translate mechanically. Always works.

02

| inside {·} = "such that"

{i ∈ ℐ | r_{ui} > 0} = filter operation. | outside |·| = cardinality (count). Context distinguishes them.

03

Π → log → Σ

Product of probabilities always becomes sum of log-probabilities. This is a reflex move in ML derivations.

Next: M1 · L4 — Norms, Distances & Regularisation

07
← → arrow keys to navigate