The same letter with different subscripts means completely different things.
Mastering indexing is what makes equations stop looking like noise.
| Index | Refers To | Example | Plain English |
|---|---|---|---|
| u | a specific user | p_u, r_{ui}, e_u^{(l)} | "for user u" |
| i | a specific item (positive) | q_i, r_{ui}, ŷ_{ui} | "for item i" |
| j | another item (usually negative) | ŷ_{uj}, q_j | "for negative item j" (BPR) |
| k | a latent dimension | p_{u,k}, e_{i,k} | "the k-th latent feature" |
| t | time step or iteration | x^{(t)}, θ^{(t)} | "at time step t" |
| l | layer in a neural network | h^{(l)}, e_u^{(l)} | "at layer l" |
| n, m | sizes / counts | n = |𝒰|, m = |ℐ| | "total number of users/items" |
Universal convention: u = user, i = item. This is consistent across virtually every RS paper. Once you see it, you can't unsee it. j is almost always the negative item in pairwise ranking.
Subscript = which instance of the variable. Think of it as an array index.
Reading rule: "The [variable name] of [subscript meaning]". So p_{u,k} = "the p-value of user u, dimension k".
Combined subscripts — just read left to right:
\( \mathbf{e}_u^{(l)} \) = user u's embedding at graph layer l. Parentheses around the superscript signal it's a layer counter, not an exponent.
\( heta^{(t)} \) = parameters at training iteration t. \( \mathbf{x}^{(t)} \) = feature vector at time t in sequential RS.
\( \mathcal{I}_u^+ \) = positive items (⁺ means positive set). \( \mathbf{e}^* \) = optimal or final embedding (⁎ means final/optimal).
Critical distinction: \( \mathbf{e}^{(2)} \) (superscript in parentheses) = layer 2 embedding. \( e^2 \) (no parentheses) = e squared. The parentheses matter!
| Notation | Type | Example |
|---|---|---|
| R (bold upper) | Matrix | R ∈ ℝ^{m×n} = full rating matrix |
| p_u (bold lower) | Vector | p_u ∈ ℝ^d = user u's embedding vector |
| r_{ui} (plain lower) | Scalar | r_{ui} ∈ ℝ = one rating value |
| 𝒰 (calligraphic) | Set | 𝒰 = {u₁, u₂, ..., uₙ} |
| |𝒰| (cardinality) | Integer | |𝒰| = n = number of users |
Reading r_u and r_{ui} as the same thing. They're not. r_u is a vector (all of u's ratings). r_{ui} is one number.
r_u = the full row of user u in the rating matrix. r_{ui} = one cell from that row. Bold = vector (all items), plain = scalar (one item).
Quick test: If a variable could be one number — it's plain. If it's a list of numbers — it's bold. If it's a grid of numbers — it's bold uppercase.
In ML and RS, a hat over a variable means it's the model's prediction, not the ground truth.
| Notation | Meaning |
|---|---|
| r_{ui} | True observed rating (ground truth) |
| r̂_{ui} | Predicted rating (model output) |
| ŷ_{ui} | Predicted preference score |
| θ̂ | Estimated/fitted parameters |
The entire point of training is to make ŷ_{ui} as close as possible to y_{ui} (or r_{ui}). The loss function measures the gap between hat and no-hat.
BPR introduces a triple (u, i, j) where three different subscripts play three different roles:
The universal RS index vocabulary. Also: k=dimension, l=layer, t=time. Memorise these — they appear in every paper.
ŷ_{ui} is the model's output. y_{ui} is the truth. The loss always measures the gap between hat and no-hat.
p_u (bold) = entire embedding vector. p_{u,k} (plain) = one number. Never confuse the two.
Next: M1 · L3 — Summation, Product & Set Notation