M1 · Lesson 1 — Math Notation Literacy

Greek Letters &
Common Symbols

Every RS paper is written in two languages: English and math symbols.
This lesson builds your symbol vocabulary — the alphabet of equations.

01

M1 · L1 — Why This Matters

The cost of symbol blindness

90% of equation confusion
comes from unfamiliar symbols

When you hit an unfamiliar symbol mid-equation, your reading stops. You lose the thread. The math starts to feel impossible — but it's not the math that's hard, it's the alphabet.

Once you build symbol fluency, you'll realise most RS equations are built from the same 20–30 symbols in different arrangements.

The test: You should be able to read the symbols in this equation out loud, even before understanding what it means:
\( \mathcal{L} = -\sum_{(u,i,j)\in\mathcal{D}} \ln\sigma(\hat{y}_{ui} - \hat{y}_{uj}) + \lambda\|\theta\|^2 \)

Greek letters are used as variables, constants, and hyperparameters
Each letter has a conventional meaning — not fixed, but strongly conventional in RS
Capital vs lowercase of the same letter means something completely different
Always check the paper's notation table — usually in Section 3 (Preliminaries)

02

M1 · L1 — Greek Letters

The core vocabulary — part 1

Greek letters most used
in RS optimisation

α

alpha

Learning rate
\(\theta \leftarrow \theta - \alpha \nabla\mathcal{L}\)

β

beta

Reg. weight or
secondary hyperparam

γ

gamma

Discount factor
in RL-based RS

λ

lambda

Regularisation strength
\(\lambda\|\theta\|^2\)

θ

theta

All model parameters
(the learnable set)

σ

sigma (lower)

Sigmoid function
\(\sigma(x)=\frac{1}{1+e^{-x}}\)

μ

mu

Mean of a
distribution

ε

epsilon

Noise term or
exploration rate

03

M1 · L1 — Greek Letters

The core vocabulary — part 2

More Greek letters —
functions and structures

φ

phi

Feature mapping
or encoder function

ψ

psi

Decoder or
projection function

η

eta

Step size or
learning rate (alt.)

ρ

rho

Correlation or
density parameter

Σ

Sigma (CAPITAL)

Summation
\(\sum_{i=1}^n x_i\)

Δ

Delta (CAPITAL)

Change in quantity
or difference

Π

Pi (CAPITAL)

Product notation
\(\prod_{i=1}^n x_i\)

Ω

Omega (CAPITAL)

Sample space or
parameter domain

Critical: Σ (capital, summation) vs σ (lowercase, sigmoid) look similar and mean completely different things. Always check case.

04

M1 · L1 — Common Confusion

The most confusing pairs in RS papers

Same letter, completely
different meaning

Σ

Capital Sigma
Summation operator
\(\sum_{i=1}^n x_i\)

σ

Lowercase sigma
Sigmoid function
\(\sigma(x) = \frac{1}{1+e^{-x}}\)

Π

Capital Pi
Product notation
\(\prod_{i=1}^n p_i\)

π

Lowercase pi
Policy in RL-RS or
probability vector

θ

Theta (plain)
All model parameters
as a set

Θ

Theta (capital)
Big-Theta notation
(complexity)

φ

Phi
Feature mapping
\(\phi: \mathcal{X} \to \mathbb{R}^d\)

Φ

Phi (capital)
Full parameter matrix
or CDF of Gaussian

05

M1 · L1 — Operators

The logical operators

Operators that appear
in every RS paper

∈	"in" / "element of"	u ∈ 𝒰 → u is a user in the user set
∉	"not in"	j ∉ 𝒩_u → j is not in u's neighbourhood
∀	"for all"	∀u ∈ 𝒰 → for every user
∃	"there exists"	∃i such that r_{ui} > 0
∝	"proportional to"	P(u\|i) ∝ P(i\|u)·P(u)
≜	"defined as"	Introduces a new definition

≈	"approximately"	Used when simplifying or approximating
ℝ	"real numbers"	ℝ^{m×n} = real matrix, m rows × n cols
ℕ	"natural numbers"	k ∈ ℕ → k is a positive integer
⊆	"subset of"	𝒮 ⊆ ℐ → 𝒮 is a subset of items
←	"is assigned"	θ ← θ - α∇ℒ → update rule
·	"dot product"	p_u · q_i = scalar similarity score

06

M1 · L1 — Font Conventions

The font conventions every RS paper follows

What the font tells you
before you read the symbol

Calligraphic Capital 𝒰 𝒜 ℐ

= A Set

𝒰 = user set, ℐ = item set, 𝒟 = training data, ℒ = loss function, 𝒢 = graph, 𝒩 = neighbourhood

LaTeX: \mathcal{U}

Bold Uppercase P Q R

= A Matrix

P ∈ ℝ^{m×d} = user embedding matrix, Q = item matrix, R = rating matrix, A = adjacency matrix

LaTeX: \mathbf{P}

Bold Lowercase p q e

= A Vector

p_u = user u's embedding vector, q_i = item i's vector, e = embedding, h = hidden state

LaTeX: \mathbf{p}

Plain lowercase r ŷ x

= A Scalar (a single number)

r_{ui} = one rating value, ŷ_{ui} = one predicted score, x_k = one feature value. Plain font = scalar. Always.

07

M1 · L1 — Worked Example

Decoding a real paper sentence

Reading a parameter
definition from scratch

From a Matrix Factorisation paper

"Let θ = {P, Q} denote the model parameters, where P ∈ ℝ^{m×d} and Q ∈ ℝ^{n×d}, and λ > 0 controls the regularisation strength."

θ

All learnable parameters — both embedding matrices together

{P, Q}

θ is the set containing exactly two matrices: user embeddings and item embeddings

P ∈ ℝ^{m×d}

P is a real-valued matrix with m rows (one per user) and d columns (latent dims)

Q ∈ ℝ^{n×d}

Q is same shape but n rows (one per item)

λ > 0

λ (lambda) is a positive scalar hyperparameter controlling regularisation strength

Plain English

Our model has two learnable matrices — one row per user, one per item — plus a regularisation strength hyperparameter.

08

M1 · L1 — RS Conventions

Conventions specific to recommender systems

The RS-specific
symbol vocabulary

Symbol	Name	Meaning in RS	Example
𝒰	User set	Set of all users in the system	\|𝒰\| = m users total
ℐ	Item set	Set of all items in the system	\|ℐ\| = n items total
𝒪	Observed set	All known user-item interactions	(u,i) ∈ 𝒪 means u interacted with i
r_{ui}	Rating	Score user u gave item i	r_{ui} ∈ {1,2,3,4,5} or {0,1}
ŷ_{ui}	Predicted score	Model's predicted preference	ŷ_{ui} = p_u^⊤ q_i
ℐ⁺_u	Positive items	Items user u has interacted with	ℐ⁺_u = {i : r_{ui} > 0}
𝒩_u	Neighbourhood	Users or items connected to u	In graph-based RS: direct neighbours
d	Latent dimension	Size of embedding vectors	p_u ∈ ℝ^d, typically d=64 or 128

09

M1 · L1 — Important Warning

The most important habit to build

Same symbol, different
meaning in different papers

Conventions are strong but not fixed. These are all valid usages in different RS papers:

Symbol	Paper A	Paper B
β	Regularisation weight	KL penalty in β-VAE
k	Number of items to recommend	Latent dimension size
L	Loss function	Number of GNN layers
d	Embedding dimension	Degree of a graph node
𝒩_u	Neighbour users	Neighbour items

Rule #1

Always find the notation section

Usually Table 1 or the start of Section 3. Read it first before reading equations.

Rule #2

First use = definition

The first time a symbol appears in the text, the paper defines it. If you're confused mid-paper — search backwards for the first occurrence.

The habit: Before reading equations, spend 5 minutes reading the notation table and the first paragraph of Section 3. It saves you 30 minutes of confusion later.

10

M1 · L1 — Key Takeaways

What to remember

01 · Greek letters

α, β, γ, λ, θ, σ, μ, ε

Learn their conventional RS meaning. α = learning rate. λ = regularisation. θ = all parameters. σ = sigmoid.

02 · Font = type

𝒰 = set · P = matrix · p = vector · r = scalar

The font tells you what kind of mathematical object you're looking at, before you read what it represents.

03 · Context first

Read the notation table before equations

Same symbol, different paper, different meaning. Check every time. First occurrence = definition.

Next: M1 · L2 — Subscripts, Superscripts & Indexing

11

Greek Letters &Common Symbols