M5 · Lesson 4 — Mathematical Reasoning

Common Proof
Techniques in RS/ML

Beyond structure — the recurring mathematical tools
that appear in almost every RS theory paper.

01
M5 · L4 — Overview

The toolkit

Three techniques that
appear everywhere

Bounding

Upper/lower bounds on loss

Showing a quantity is at most X or at least Y. Used in convergence proofs, regret bounds, approximation guarantees.

Convexity

Convex/concave arguments

If a function is convex, gradient descent finds the global minimum. Used to justify optimisation algorithms.

Lipschitz

Lipschitz continuity

Bounding how fast a function changes. Used to prove gradient descent converges at a certain rate.

Recognition skill: You don't need to derive these from scratch. You need to recognise when a paper is using them and understand what the argument is claiming.

02
M5 · L4 — Bounding Arguments

Technique 1

Bounding arguments

The core idea: show a quantity cannot be too large or too small. Common in:

  • Convergence proofs — "the loss decreases by at least ε per step"
  • Approximation results — "our method achieves within α of optimal"
  • Generalisation bounds — "test error ≤ training error + complexity term"

Key inequalities you'll see: Triangle inequality, Cauchy-Schwarz, AM-GM, Jensen's inequality. Each lets you replace a complex expression with a simpler bound.

RS Example · BPR loss upper bound
1.
Claim: The BPR loss ℒ_BPR ≥ 0.
2.
σ(x) ∈ (0,1) for all x, so ln σ(x) ∈ (−∞, 0).
3.
−ln σ(x) > 0 for all x. Each term positive.
4.
λ‖θ‖² ≥ 0 since λ > 0.
5.
ℒ_BPR ≥ 0. A valid lower bound of 0 exists. □
03
M5 · L4 — Convexity

Technique 2

Convexity arguments

A function f is convex if the line segment between any two points lies above the function:

f(αx + (1−α)y) ≤ αf(x) + (1−α)f(y)

Why it matters in RS:

  • If ℒ is convex → every local minimum is a global minimum
  • Squared loss is convex. Log-loss is convex. L2 regularisation is convex.
  • Sum of convex functions is convex → MF loss is convex in each embedding separately
In papers — signal phrase

"The objective is jointly non-convex but convex in each block."

This is how MF is justified — alternating least squares can be used because each sub-problem is convex.

Common mistake

Convex in each variable ≠ jointly convex

Most deep RS models are not jointly convex. That's fine — but the paper should acknowledge this explicitly.

04
M5 · L4 — Lipschitz Continuity

Technique 3

Lipschitz continuity

A function f is L-Lipschitz if it doesn't change too fast:

‖f(x) − f(y)‖ ≤ L‖x − y‖

L is the Lipschitz constant — it bounds the maximum gradient magnitude.

Why it appears in RS/ML:

  • Proves gradient descent converges: step size α < 1/L guarantees decrease
  • Bounds how much predictions change when input changes slightly
  • Used in robustness arguments for recommendation under perturbation
Reading a Lipschitz argument in a paper
1.
Look for: "assume f is L-smooth" or "the gradient is Lipschitz continuous"
2.
This means: ‖∇f(x) − ∇f(y)‖ ≤ L‖x − y‖ — the gradient itself doesn't change too fast.
3.
Consequence: With learning rate α ≤ 1/L, gradient descent decreases the loss at every step.
4.
What to check: Is L actually computed or just assumed? Many papers assume smoothness without verifying it.
05
M5 · L4 — Reading a Proof

Putting it together

How to read a proof
that uses these techniques

When you encounter a proof, ask these questions in order:

  • What is being claimed? Read the theorem statement, not the proof.
  • What technique is used? Look for bounding, convexity, or Lipschitz arguments.
  • What assumptions are required? Every technique requires conditions — are they met?
  • Is the bound tight? A loose bound proves the claim but may not be useful in practice.
"You don't need to re-derive every proof. You need to understand what it's claiming and whether the assumptions are realistic."

The critical question: Does the paper's proof actually apply to their specific model? Many papers cite general convergence results that technically don't apply to their non-convex architecture.

06
M5 · L4 — Key Takeaways

What to remember

Bounding

Show a quantity ≤ X or ≥ Y

Most common in convergence proofs and approximation results. Look for triangle inequality and Cauchy-Schwarz.

Convexity

Every local = global minimum

Squared loss, log loss, L2 reg are all convex. Most deep models are not — but each block can be.

Lipschitz

Gradient bounded by L

Enables convergence rate proofs. Check: is L computed or assumed? Does the bound actually apply to this model?

Next: M5 · L5 — Writing Your First Proof

07
← → arrow keys to navigate