M1 · Lesson 2 — Math Notation Literacy

Subscripts,
Superscripts & Indexing

The same letter with different subscripts means completely different things.
Mastering indexing is what makes equations stop looking like noise.

01

M1 · L2 — Index Vocabulary

The standard RS index convention

What each index
letter refers to

Index	Refers To	Example	Plain English
u	a specific user	p_u, r_{ui}, e_u^{(l)}	"for user u"
i	a specific item (positive)	q_i, r_{ui}, ŷ_{ui}	"for item i"
j	another item (usually negative)	ŷ_{uj}, q_j	"for negative item j" (BPR)
k	a latent dimension	p_{u,k}, e_{i,k}	"the k-th latent feature"
t	time step or iteration	x^{(t)}, θ^{(t)}	"at time step t"
l	layer in a neural network	h^{(l)}, e_u^{(l)}	"at layer l"
n, m	sizes / counts	n = \|𝒰\|, m = \|ℐ\|	"total number of users/items"

Universal convention: u = user, i = item. This is consistent across virtually every RS paper. Once you see it, you can't unsee it. j is almost always the negative item in pairwise ranking.

02

M1 · L2 — Reading Subscripts

Subscripts — which instance

Subscripts tell you
which specific object

Subscript = which instance of the variable. Think of it as an array index.

\[ r_{ui} \quad ightarrow \quad \text{rating by user } u \text{ for item } i \]

\[ \mathbf{e}_i^{(l)} \quad ightarrow \quad \text{embedding of item } i \text{ at layer } l \]

\[ p_{u,k} \quad ightarrow \quad k\text{-th dim of user }u\text{'s embedding} \]

Reading rule: "The [variable name] of [subscript meaning]". So p_{u,k} = "the p-value of user u, dimension k".

Combined subscripts — just read left to right:

Decoding combined subscripts

e_{u,k}^{(l)}

k-th dimension of user u's embedding at layer l

W_{ij}^{(l)}

Weight connecting node i to node j at layer l

h_{u,t}

Hidden state of user u at time step t

α_{u,i}^{(l)}

Attention weight from u to i at layer l

r̂_{ui}^{(k)}

Predicted rating for (u,i) from component k

03

M1 · L2 — Superscripts

Superscripts — layer, time, or special marker

What superscripts
commonly signal

Superscript type 1

Layer index (l)

\( \mathbf{e}_u^{(l)} \) = user u's embedding at graph layer l. Parentheses around the superscript signal it's a layer counter, not an exponent.

Superscript type 2

Time step (t)

\( heta^{(t)} \) = parameters at training iteration t. \( \mathbf{x}^{(t)} \) = feature vector at time t in sequential RS.

Superscript type 3

Special marker

\( \mathcal{I}_u^+ \) = positive items (⁺ means positive set). \( \mathbf{e}^* \) = optimal or final embedding (⁎ means final/optimal).

Critical distinction: \( \mathbf{e}^{(2)} \) (superscript in parentheses) = layer 2 embedding. \( e^2 \) (no parentheses) = e squared. The parentheses matter!

\[ \mathbf{e}_u^* = rac{1}{L+1} \sum_{l=0}^{L} \mathbf{e}_u^{(l)} \quad \leftarrow \text{ final embedding = average of all layer embeddings} \]

04

M1 · L2 — Bold vs Plain

The most commonly missed distinction

Bold vs plain —
matrix, vector, or scalar

Notation	Type	Example
R (bold upper)	Matrix	R ∈ ℝ^{m×n} = full rating matrix
p_u (bold lower)	Vector	p_u ∈ ℝ^d = user u's embedding vector
r_{ui} (plain lower)	Scalar	r_{ui} ∈ ℝ = one rating value
𝒰 (calligraphic)	Set	𝒰 = {u₁, u₂, ..., uₙ}
\|𝒰\| (cardinality)	Integer	\|𝒰\| = n = number of users

❌ Common misread

Reading r_u and r_{ui} as the same thing. They're not. r_u is a vector (all of u's ratings). r_{ui} is one number.

✅ Correct reading

r_u = the full row of user u in the rating matrix. r_{ui} = one cell from that row. Bold = vector (all items), plain = scalar (one item).

Quick test: If a variable could be one number — it's plain. If it's a list of numbers — it's bold. If it's a grid of numbers — it's bold uppercase.

05

M1 · L2 — The Hat Notation

One symbol that appears everywhere

The hat (^) always means
"predicted"

In ML and RS, a hat over a variable means it's the model's prediction, not the ground truth.

Notation	Meaning
r_{ui}	True observed rating (ground truth)
r̂_{ui}	Predicted rating (model output)
ŷ_{ui}	Predicted preference score
θ̂	Estimated/fitted parameters

The entire point of training is to make ŷ_{ui} as close as possible to y_{ui} (or r_{ui}). The loss function measures the gap between hat and no-hat.

\[ \mathcal{L} = \sum_{(u,i)\in\mathcal{O}} (r_{ui} - \hat{r}_{ui})^2 + \lambda\| heta\|^2 \]

Decoded

r_{ui}

True rating (no hat = observed truth)

r̂_{ui}

Predicted rating (hat = model output)

(r - r̂)²

Squared error between truth and prediction

06

M1 · L2 — Worked Example

Full decoding — LightGCN layer equation

Reading a complex
indexed expression

\[ \mathbf{e}_u^{(l+1)} = \text{AGG}\left(\mathbf{e}_u^{(l)},\ \left\{\mathbf{e}_i^{(l)}\ :\ i \in \mathcal{N}_u ight\} ight) \]

Symbol by symbol

e_u^{(l+1)}

User u's embedding at the NEXT layer (l+1)

AGG(·)

Aggregation function — combines information from neighbours

e_u^{(l)}

User u's OWN embedding at current layer (l)

{e_i^{(l)} : i ∈ 𝒩_u}

Set of embeddings of all items i in u's neighbourhood, at current layer

𝒩_u

Neighbourhood of u — the items user u has interacted with

Plain English

User u's new embedding is computed by aggregating u's own current embedding with the current embeddings of all items u has interacted with.

07

M1 · L2 — BPR Indexing

Indexing in pairwise ranking

Understanding u, i, j
in BPR notation

BPR introduces a triple (u, i, j) where three different subscripts play three different roles:

The BPR triple

u

The user — same as always

i

A POSITIVE item — one u has interacted with

j

A NEGATIVE item — one u has NOT interacted with

ŷ_{ui} > ŷ_{uj}

We want the model to score i higher than j for user u

\[ \mathcal{L}_{BPR} = -\sum_{(u,i,j)\in\mathcal{D}} \ln\sigma(\hat{y}_{ui} - \hat{y}_{uj}) + \lambda\| heta\|^2 \]

Plain English

For every (user, positive item, negative item) triple, penalise the model when it scores the negative item j higher than the positive item i. The larger ŷ_{ui} - ŷ_{uj}, the lower the loss.

08

M1 · L2 — Key Takeaways

What to remember

01

u=user, i=item, j=negative

The universal RS index vocabulary. Also: k=dimension, l=layer, t=time. Memorise these — they appear in every paper.

02

^ = predicted

ŷ_{ui} is the model's output. y_{ui} is the truth. The loss always measures the gap between hat and no-hat.

03

Bold = vector/matrix. Plain = scalar.

p_u (bold) = entire embedding vector. p_{u,k} (plain) = one number. Never confuse the two.

Next: M1 · L3 — Summation, Product & Set Notation

09

Subscripts,Superscripts & Indexing