Beyond structure — the recurring mathematical tools
that appear in almost every RS theory paper.
Showing a quantity is at most X or at least Y. Used in convergence proofs, regret bounds, approximation guarantees.
If a function is convex, gradient descent finds the global minimum. Used to justify optimisation algorithms.
Bounding how fast a function changes. Used to prove gradient descent converges at a certain rate.
Recognition skill: You don't need to derive these from scratch. You need to recognise when a paper is using them and understand what the argument is claiming.
The core idea: show a quantity cannot be too large or too small. Common in:
Key inequalities you'll see: Triangle inequality, Cauchy-Schwarz, AM-GM, Jensen's inequality. Each lets you replace a complex expression with a simpler bound.
A function f is convex if the line segment between any two points lies above the function:
Why it matters in RS:
This is how MF is justified — alternating least squares can be used because each sub-problem is convex.
Most deep RS models are not jointly convex. That's fine — but the paper should acknowledge this explicitly.
A function f is L-Lipschitz if it doesn't change too fast:
L is the Lipschitz constant — it bounds the maximum gradient magnitude.
Why it appears in RS/ML:
When you encounter a proof, ask these questions in order:
The critical question: Does the paper's proof actually apply to their specific model? Many papers cite general convergence results that technically don't apply to their non-convex architecture.
Most common in convergence proofs and approximation results. Look for triangle inequality and Cauchy-Schwarz.
Squared loss, log loss, L2 reg are all convex. Most deep models are not — but each block can be.
Enables convergence rate proofs. Check: is L computed or assumed? Does the bound actually apply to this model?
Next: M5 · L5 — Writing Your First Proof