The capstone lesson. No new concepts — only synthesis.
You decode complete real equations from published RS papers.
Is this a sum? A product? A min? An expectation? The outermost operator tells you what the equation is computing.
What is being summed or iterated over? Users? Items? Layers? Pairs? This tells you the scope of the computation.
Work from inside out. Identify each sub-expression, its type (scalar/vector/matrix), and its RS meaning.
What is this equation minimising/maximising/computing? What is the model trying to achieve?
If you can't express the equation in 2–3 sentences of plain English, go back to Step 3. Understanding isn't complete until you can explain it.
Why average across layers? Each layer captures information from further away in the graph. Averaging combines local (l=1) and global (l=L) collaborative signals.
| See this | It means |
|---|---|
| α | Learning rate |
| λ | Regularisation strength |
| θ | All model parameters |
| σ(x) | Sigmoid function |
| Σ (capital) | Summation (for-loop) |
| r_{ui} | True rating u→i |
| ŷ_{ui} | Predicted score u→i |
| e_u^{(l)} | User embedding at layer l |
| ‖·‖ | L2 norm (default) |
| ‖M‖_F | Frobenius norm |
| See this | It means |
|---|---|
| ∂ℒ/∂θ | Partial derivative (how ℒ changes w.r.t. θ) |
| ∇_θ ℒ | Gradient — vector of all partial derivatives |
| argmin_θ ℒ | θ values that minimise ℒ |
| 𝒰, ℐ, 𝒪 (calligraphic) | Sets (user, item, observed) |
| P, Q (bold upper) | Matrices |
| p_u (bold lower) | Vector |
| 𝔼_{x~P}[·] | Expected value over samples from P |
| D_KL(Q‖P) | KL divergence — gap between Q and P |
| x ~ 𝒩(μ,σ²) | x sampled from Gaussian |
| | · | bars | Cardinality (set size) or absolute value |
Greek letters, font conventions, subscripts/superscripts. Bold=matrix/vector. Plain=scalar. Hat=predicted.
Σ=loop, |·|=filter or size, Π→log→Σ. Norms measure size. λ‖θ‖² = regularisation. Dot product = similarity.
∂=partial, ∇=gradient, argmin=training goal. P(·|·)=conditional, 𝔼=average, KL=distribution gap.
What comes next
You can now open any RS paper and read its equations without freezing. The next step is M2: Reading Papers Effectively — where notation fluency pays off in real paper analysis.