M1 · Lesson 3 — Math Notation Literacy

Summation, Product
& Set Notation

Loss functions, evaluation metrics, training objectives — all built from summations.
Learn to read Σ as a for-loop in disguise.

01

M1 · L3 — Summation

The most used operator in RS math

Σ is a for-loop —
read it that way

\[ \sum_{i=1}^{n} x_i = x_1 + x_2 + \cdots + x_n \]

But in RS papers, the limits are almost never 1 to n. They're sets:

\[ \sum_{u \in \mathcal{U}} \sum_{i \in \mathcal{I}_u} r_{ui} \]

Plain English

For each user u in the user set, for each item i that u has rated, accumulate the rating value.

# Nested Σ = nested for-loop
# ∑_{u∈U} ∑_{i∈I_u} r_{ui}

for u in U:
    for i in I_u:  # items u rated
        accumulate(r[u][i])
        

The mental translation: Every Σ is a loop. Every nested Σ is a nested loop. Translate mechanically — it always works.

02

M1 · L3 — Set Notation

The RS set vocabulary

Sets you'll see in
every RS paper

Notation	Read As	RS Meaning	Example Use
𝒪	"observed set"	All known (u,i) interaction pairs	(u,i) ∈ 𝒪 means we know r_{ui}
𝒩_u	"neighbourhood of u"	Items user u has interacted with	i ∈ 𝒩_u → u rated i
ℐ⁺_u	"positive items of u"	Items u has interacted with (same as 𝒩_u)	{i ∈ ℐ : r_{ui} > 0}
ℐ⁻_u	"negative items of u"	Items u has NOT interacted with	Sampled as negatives in BPR
𝒟	"training data"	Set of training triples (u,i,j)	(u,i,j) ∈ 𝒟 in BPR loss
\|𝒰\|	"cardinality of U"	Total number of users	\|𝒰\| = n = 6,038
{x \| condition}	"set of x such that"	Filter a set by a condition	{i ∈ ℐ \| r_{ui} > 0} = u's positive items

03

M1 · L3 — Set Builder Notation

Reading the filter operator

The vertical bar | means
"such that" — it filters

\[ \{i \in \mathcal{I}\ |\ r_{ui} > 0\} \]

Reading it

{

Start of a set definition

i ∈ ℐ

Consider all items i from the item set ℐ

|

"such that" — filter by the condition that follows

r_{ui} > 0

Only keep items where u's rating is positive

Plain English

The set of all items that user u has interacted with (positively rated).

# {i ∈ I | r_{ui} > 0} in Python

pos_items_u = [
    i for i in I
    if ratings[u][i] > 0
]

# Or as a set comprehension:
# {i | r_{ui} > 0} = set(pos_items_u)
        

|ℐ⁺_u| = the count of u's positive items (cardinality). The outer | · | bars mean "how many", not "such that".

04

M1 · L3 — Product Notation

When you see Π — think multiplication loop

Product notation and
why we always take log

\[ \prod_{i=1}^n x_i = x_1 imes x_2 imes \cdots imes x_n \]

In RS, Π appears in likelihood functions — the probability of observing all ratings assuming independence:

\[ \mathcal{L} = \prod_{(u,i)\in\mathcal{O}} P(r_{ui} \mid heta) \]

"The probability of observing all ratings, given parameters θ."

Why we immediately take log:

\[ \ln \prod P_i = \sum \ln P_i \]

Reason 1

Numerical stability — products of small probabilities underflow to zero

Reason 2

Same optimum — log is monotone so argmax doesn't change

Reason 3

Clean gradients — d/dx ln(x) = 1/x

05

M1 · L3 — BPR Loss Decoded

Full worked example

Decoding the BPR loss
from scratch

\[ \mathcal{L}_{BPR} = -\sum_{(u,i,j)\in\mathcal{D}} \ln\sigma(\hat{y}_{ui} - \hat{y}_{uj}) + \lambda\| heta\|^2 \]

Every symbol decoded

∑_{(u,i,j)∈𝒟}

Loop over all (user, positive item, negative item) triples in training data

ŷ_{ui} − ŷ_{uj}

Difference between model score for positive item and negative item

σ(·)

Sigmoid: converts the score difference to a probability ∈ (0,1)

ln σ(·)

Log-probability — we maximise this (equivalent to minimising its negative)

−∑ ···

Negative sign: we minimise loss, so flip the maximisation

λ‖θ‖²

L2 regularisation: penalise large parameter values

Plain English

For every training triple, penalise the model when it scores a negative item higher than a positive item. Also penalise large parameter values to prevent overfitting.

06

M1 · L3 — Key Takeaways

What to remember

01

Σ = for-loop

Every summation is a loop. Nested Σ = nested loop. Translate mechanically. Always works.

02

| inside {·} = "such that"

{i ∈ ℐ | r_{ui} > 0} = filter operation. | outside |·| = cardinality (count). Context distinguishes them.

03

Π → log → Σ

Product of probabilities always becomes sum of log-probabilities. This is a reflex move in ML derivations.

Next: M1 · L4 — Norms, Distances & Regularisation

07

Summation, Product& Set Notation

The most used operator in RS math

Σ is a for-loop —read it that way

The RS set vocabulary

Sets you'll see inevery RS paper

Reading the filter operator

The vertical bar | means"such that" — it filters

When you see Π — think multiplication loop

Product notation andwhy we always take log

Full worked example

Decoding the BPR lossfrom scratch

What to remember

Σ = for-loop

| inside {·} = "such that"

Π → log → Σ

Summation, Product
& Set Notation

Σ is a for-loop —
read it that way

Sets you'll see in
every RS paper

The vertical bar | means
"such that" — it filters

Product notation and
why we always take log

Decoding the BPR loss
from scratch