M1 · Lesson 2 — Math Notation Literacy

Subscripts,
Superscripts & Indexing

The same letter with different subscripts means completely different things.
Mastering indexing is what makes equations stop looking like noise.

01
M1 · L2 — Index Vocabulary

The standard RS index convention

What each index
letter refers to

IndexRefers ToExamplePlain English
ua specific userp_u, r_{ui}, e_u^{(l)}"for user u"
ia specific item (positive)q_i, r_{ui}, ŷ_{ui}"for item i"
janother item (usually negative)ŷ_{uj}, q_j"for negative item j" (BPR)
ka latent dimensionp_{u,k}, e_{i,k}"the k-th latent feature"
ttime step or iterationx^{(t)}, θ^{(t)}"at time step t"
llayer in a neural networkh^{(l)}, e_u^{(l)}"at layer l"
n, msizes / countsn = |𝒰|, m = |ℐ|"total number of users/items"

Universal convention: u = user, i = item. This is consistent across virtually every RS paper. Once you see it, you can't unsee it. j is almost always the negative item in pairwise ranking.

02
M1 · L2 — Reading Subscripts

Subscripts — which instance

Subscripts tell you
which specific object

Subscript = which instance of the variable. Think of it as an array index.

\[ r_{ui} \quad ightarrow \quad \text{rating by user } u \text{ for item } i \]
\[ \mathbf{e}_i^{(l)} \quad ightarrow \quad \text{embedding of item } i \text{ at layer } l \]
\[ p_{u,k} \quad ightarrow \quad k\text{-th dim of user }u\text{'s embedding} \]

Reading rule: "The [variable name] of [subscript meaning]". So p_{u,k} = "the p-value of user u, dimension k".

Combined subscripts — just read left to right:

Decoding combined subscripts
e_{u,k}^{(l)}
k-th dimension of user u's embedding at layer l
W_{ij}^{(l)}
Weight connecting node i to node j at layer l
h_{u,t}
Hidden state of user u at time step t
α_{u,i}^{(l)}
Attention weight from u to i at layer l
r̂_{ui}^{(k)}
Predicted rating for (u,i) from component k
03
M1 · L2 — Superscripts

Superscripts — layer, time, or special marker

What superscripts
commonly signal

Superscript type 1

Layer index (l)

\( \mathbf{e}_u^{(l)} \) = user u's embedding at graph layer l. Parentheses around the superscript signal it's a layer counter, not an exponent.

Superscript type 2

Time step (t)

\( heta^{(t)} \) = parameters at training iteration t. \( \mathbf{x}^{(t)} \) = feature vector at time t in sequential RS.

Superscript type 3

Special marker

\( \mathcal{I}_u^+ \) = positive items (⁺ means positive set). \( \mathbf{e}^* \) = optimal or final embedding (⁎ means final/optimal).

Critical distinction: \( \mathbf{e}^{(2)} \) (superscript in parentheses) = layer 2 embedding. \( e^2 \) (no parentheses) = e squared. The parentheses matter!

\[ \mathbf{e}_u^* = rac{1}{L+1} \sum_{l=0}^{L} \mathbf{e}_u^{(l)} \quad \leftarrow \text{ final embedding = average of all layer embeddings} \]
04
M1 · L2 — Bold vs Plain

The most commonly missed distinction

Bold vs plain —
matrix, vector, or scalar

NotationTypeExample
R (bold upper)MatrixR ∈ ℝ^{m×n} = full rating matrix
p_u (bold lower)Vectorp_u ∈ ℝ^d = user u's embedding vector
r_{ui} (plain lower)Scalarr_{ui} ∈ ℝ = one rating value
𝒰 (calligraphic)Set𝒰 = {u₁, u₂, ..., uₙ}
|𝒰| (cardinality)Integer|𝒰| = n = number of users
❌ Common misread

Reading r_u and r_{ui} as the same thing. They're not. r_u is a vector (all of u's ratings). r_{ui} is one number.

✅ Correct reading

r_u = the full row of user u in the rating matrix. r_{ui} = one cell from that row. Bold = vector (all items), plain = scalar (one item).

Quick test: If a variable could be one number — it's plain. If it's a list of numbers — it's bold. If it's a grid of numbers — it's bold uppercase.

05
M1 · L2 — The Hat Notation

One symbol that appears everywhere

The hat (^) always means
"predicted"

In ML and RS, a hat over a variable means it's the model's prediction, not the ground truth.

NotationMeaning
r_{ui}True observed rating (ground truth)
r̂_{ui}Predicted rating (model output)
ŷ_{ui}Predicted preference score
θ̂Estimated/fitted parameters

The entire point of training is to make ŷ_{ui} as close as possible to y_{ui} (or r_{ui}). The loss function measures the gap between hat and no-hat.

\[ \mathcal{L} = \sum_{(u,i)\in\mathcal{O}} (r_{ui} - \hat{r}_{ui})^2 + \lambda\| heta\|^2 \]
Decoded
r_{ui}
True rating (no hat = observed truth)
r̂_{ui}
Predicted rating (hat = model output)
(r - r̂)²
Squared error between truth and prediction
06
M1 · L2 — Worked Example

Full decoding — LightGCN layer equation

Reading a complex
indexed expression

\[ \mathbf{e}_u^{(l+1)} = \text{AGG}\left(\mathbf{e}_u^{(l)},\ \left\{\mathbf{e}_i^{(l)}\ :\ i \in \mathcal{N}_u ight\} ight) \]
Symbol by symbol
e_u^{(l+1)}
User u's embedding at the NEXT layer (l+1)
AGG(·)
Aggregation function — combines information from neighbours
e_u^{(l)}
User u's OWN embedding at current layer (l)
{e_i^{(l)} : i ∈ 𝒩_u}
Set of embeddings of all items i in u's neighbourhood, at current layer
𝒩_u
Neighbourhood of u — the items user u has interacted with
Plain English
User u's new embedding is computed by aggregating u's own current embedding with the current embeddings of all items u has interacted with.
07
M1 · L2 — BPR Indexing

Indexing in pairwise ranking

Understanding u, i, j
in BPR notation

BPR introduces a triple (u, i, j) where three different subscripts play three different roles:

The BPR triple
u
The user — same as always
i
A POSITIVE item — one u has interacted with
j
A NEGATIVE item — one u has NOT interacted with
ŷ_{ui} > ŷ_{uj}
We want the model to score i higher than j for user u
\[ \mathcal{L}_{BPR} = -\sum_{(u,i,j)\in\mathcal{D}} \ln\sigma(\hat{y}_{ui} - \hat{y}_{uj}) + \lambda\| heta\|^2 \]
Plain English
For every (user, positive item, negative item) triple, penalise the model when it scores the negative item j higher than the positive item i. The larger ŷ_{ui} - ŷ_{uj}, the lower the loss.
08
M1 · L2 — Key Takeaways

What to remember

01

u=user, i=item, j=negative

The universal RS index vocabulary. Also: k=dimension, l=layer, t=time. Memorise these — they appear in every paper.

02

^ = predicted

ŷ_{ui} is the model's output. y_{ui} is the truth. The loss always measures the gap between hat and no-hat.

03

Bold = vector/matrix. Plain = scalar.

p_u (bold) = entire embedding vector. p_{u,k} (plain) = one number. Never confuse the two.

Next: M1 · L3 — Summation, Product & Set Notation

09
← → arrow keys to navigate