M5 · Lesson 4 — Mathematical Reasoning

Common Proof
Techniques in RS/ML

Beyond structure — the recurring mathematical tools
that appear in almost every RS theory paper.

01

M5 · L4 — Overview

The toolkit

Three techniques that
appear everywhere

Bounding

Upper/lower bounds on loss

Showing a quantity is at most X or at least Y. Used in convergence proofs, regret bounds, approximation guarantees.

Convexity

Convex/concave arguments

If a function is convex, gradient descent finds the global minimum. Used to justify optimisation algorithms.

Lipschitz

Lipschitz continuity

Bounding how fast a function changes. Used to prove gradient descent converges at a certain rate.

Recognition skill: You don't need to derive these from scratch. You need to recognise when a paper is using them and understand what the argument is claiming.

02

M5 · L4 — Bounding Arguments

Technique 1

Bounding arguments

The core idea: show a quantity cannot be too large or too small. Common in:

Convergence proofs — "the loss decreases by at least ε per step"
Approximation results — "our method achieves within α of optimal"
Generalisation bounds — "test error ≤ training error + complexity term"

Key inequalities you'll see: Triangle inequality, Cauchy-Schwarz, AM-GM, Jensen's inequality. Each lets you replace a complex expression with a simpler bound.

RS Example · BPR loss upper bound

1.

Claim: The BPR loss ℒ_BPR ≥ 0.

2.

σ(x) ∈ (0,1) for all x, so ln σ(x) ∈ (−∞, 0).

3.

−ln σ(x) > 0 for all x. Each term positive.

4.

λ‖θ‖² ≥ 0 since λ > 0.

5.

∴ ℒ_BPR ≥ 0. A valid lower bound of 0 exists. □

03

M5 · L4 — Convexity

Technique 2

Convexity arguments

A function f is convex if the line segment between any two points lies above the function:

f(αx + (1−α)y) ≤ αf(x) + (1−α)f(y)

Why it matters in RS:

If ℒ is convex → every local minimum is a global minimum
Squared loss is convex. Log-loss is convex. L2 regularisation is convex.
Sum of convex functions is convex → MF loss is convex in each embedding separately

In papers — signal phrase

"The objective is jointly non-convex but convex in each block."

This is how MF is justified — alternating least squares can be used because each sub-problem is convex.

Common mistake

Convex in each variable ≠ jointly convex

Most deep RS models are not jointly convex. That's fine — but the paper should acknowledge this explicitly.

04

M5 · L4 — Lipschitz Continuity

Technique 3

Lipschitz continuity

A function f is L-Lipschitz if it doesn't change too fast:

‖f(x) − f(y)‖ ≤ L‖x − y‖

L is the Lipschitz constant — it bounds the maximum gradient magnitude.

Why it appears in RS/ML:

Proves gradient descent converges: step size α < 1/L guarantees decrease
Bounds how much predictions change when input changes slightly
Used in robustness arguments for recommendation under perturbation

Reading a Lipschitz argument in a paper

1.

Look for: "assume f is L-smooth" or "the gradient is Lipschitz continuous"

2.

This means: ‖∇f(x) − ∇f(y)‖ ≤ L‖x − y‖ — the gradient itself doesn't change too fast.

3.

Consequence: With learning rate α ≤ 1/L, gradient descent decreases the loss at every step.

4.

What to check: Is L actually computed or just assumed? Many papers assume smoothness without verifying it.

05

M5 · L4 — Reading a Proof

Putting it together

How to read a proof
that uses these techniques

When you encounter a proof, ask these questions in order:

What is being claimed? Read the theorem statement, not the proof.
What technique is used? Look for bounding, convexity, or Lipschitz arguments.
What assumptions are required? Every technique requires conditions — are they met?
Is the bound tight? A loose bound proves the claim but may not be useful in practice.

"You don't need to re-derive every proof. You need to understand what it's claiming and whether the assumptions are realistic."

The critical question: Does the paper's proof actually apply to their specific model? Many papers cite general convergence results that technically don't apply to their non-convex architecture.

06

M5 · L4 — Key Takeaways

What to remember

Bounding

Show a quantity ≤ X or ≥ Y

Most common in convergence proofs and approximation results. Look for triangle inequality and Cauchy-Schwarz.

Convexity

Every local = global minimum

Squared loss, log loss, L2 reg are all convex. Most deep models are not — but each block can be.

Lipschitz

Gradient bounded by L

Enables convergence rate proofs. Check: is L computed or assumed? Does the bound actually apply to this model?

Next: M5 · L5 — Writing Your First Proof

07

Common ProofTechniques in RS/ML

The toolkit

Three techniques thatappear everywhere

Upper/lower bounds on loss

Convex/concave arguments

Lipschitz continuity

Technique 1

Bounding arguments

Technique 2

Convexity arguments

"The objective is jointly non-convex but convex in each block."

Convex in each variable ≠ jointly convex

Technique 3

Lipschitz continuity

Putting it together

How to read a proofthat uses these techniques

What to remember

Show a quantity ≤ X or ≥ Y

Every local = global minimum

Gradient bounded by L

Common Proof
Techniques in RS/ML

Three techniques that
appear everywhere

How to read a proof
that uses these techniques