M1 · Lesson 7 — Math Notation Literacy

Putting It All Together —
Full Equation Decoding

The capstone lesson. No new concepts — only synthesis.
You decode complete real equations from published RS papers.

01

M1 · L7 — Reading Strategy

A systematic approach to any equation

The 5-step equation
decoding process

Step 1

Read the outer structure

Is this a sum? A product? A min? An expectation? The outermost operator tells you what the equation is computing.

Step 2

Identify the index set

What is being summed or iterated over? Users? Items? Layers? Pairs? This tells you the scope of the computation.

Step 3

Decode each term

Work from inside out. Identify each sub-expression, its type (scalar/vector/matrix), and its RS meaning.

Step 4

Find the goal

What is this equation minimising/maximising/computing? What is the model trying to achieve?

Step 5

Write plain English

If you can't express the equation in 2–3 sentences of plain English, go back to Step 3. Understanding isn't complete until you can explain it.

02

M1 · L7 — Equation 1: LightGCN

Worked decoding — LightGCN + BPR

Equation 1:
The training objective

\[ \mathcal{L} = \sum_{(u,i,j)\in\mathcal{D}} -\ln\sigma\!\left(\mathbf{e}_u^{* op}\mathbf{e}_i^* - \mathbf{e}_u^{* op}\mathbf{e}_j^* ight) + \lambda\|\mathbf{E}^{(0)}\|^2 \]

∑_{(u,i,j)∈𝒟}

Loop over training triples: user u, positive item i, negative item j

e_u^{*⊤} e_i^*

Dot product of user u's FINAL embedding with positive item i's final embedding

e_u^{*⊤} e_j^*

Dot product of user u's final embedding with negative item j's final embedding

σ(pos - neg)

Sigmoid of score difference — convert to probability that positive outranks negative

−ln σ(·)

Negative log-likelihood — minimising this maximises the ranking probability

λ‖E^{(0)}‖²

Regularise only the INITIAL embeddings (layer 0) — all other layers are derived from these

Plain English

For every (user, positive, negative) triple, push the positive item's score above the negative. Regularise only the trainable initial embeddings.

03

M1 · L7 — Equation 2: LightGCN

Worked decoding — LightGCN

Equation 2:
The final embedding

\[ \mathbf{e}_u^* = rac{1}{L+1}\sum_{l=0}^{L} \mathbf{e}_u^{(l)} \]

e_u^*

The FINAL embedding for user u (star = final/optimal)

1/(L+1)

Dividing by the total number of layers — this is a simple average

∑_{l=0}^{L}

Sum from layer 0 up to and including layer L (inclusive of the initial embedding)

e_u^{(l)}

User u's embedding at layer l (computed by graph propagation)

Plain English

The final user embedding is the simple average of user u's embeddings across all graph layers — from the initial embedding (l=0) through all propagated layers (l=L).

Why average across layers? Each layer captures information from further away in the graph. Averaging combines local (l=1) and global (l=L) collaborative signals.

04

M1 · L7 — Quick Reference

Your cheat sheet

The M1 notation
quick reference

See this	It means
α	Learning rate
λ	Regularisation strength
θ	All model parameters
σ(x)	Sigmoid function
Σ (capital)	Summation (for-loop)
r_{ui}	True rating u→i
ŷ_{ui}	Predicted score u→i
e_u^{(l)}	User embedding at layer l
‖·‖	L2 norm (default)
‖M‖_F	Frobenius norm

See this	It means
∂ℒ/∂θ	Partial derivative (how ℒ changes w.r.t. θ)
∇_θ ℒ	Gradient — vector of all partial derivatives
argmin_θ ℒ	θ values that minimise ℒ
𝒰, ℐ, 𝒪 (calligraphic)	Sets (user, item, observed)
P, Q (bold upper)	Matrices
p_u (bold lower)	Vector
𝔼_{x~P}[·]	Expected value over samples from P
D_KL(Q‖P)	KL divergence — gap between Q and P
x ~ 𝒩(μ,σ²)	x sampled from Gaussian
\| · \| bars	Cardinality (set size) or absolute value

05

M1 Complete — Math Notation Literacy

Module 1 complete —
what you can now do

L1–L2

Alphabet + Indexing

Greek letters, font conventions, subscripts/superscripts. Bold=matrix/vector. Plain=scalar. Hat=predicted.

L3–L4

Operators + Norms

Σ=loop, |·|=filter or size, Π→log→Σ. Norms measure size. λ‖θ‖² = regularisation. Dot product = similarity.

L5–L7

Calculus + Probability

∂=partial, ∇=gradient, argmin=training goal. P(·|·)=conditional, 𝔼=average, KL=distribution gap.

What comes next

You can now open any RS paper and read its equations without freezing. The next step is M2: Reading Papers Effectively — where notation fluency pays off in real paper analysis.

06

Putting It All Together —Full Equation Decoding