Modelling the relationship between two numerical variables
Can advertising spend predict monthly sales? Linear regression gives us a mathematical answer — and a decision-making tool for accountants and business analysts.
What is linear regression and how do we build the line?
We call it simple because there is exactly one predictor variable.
An analyst is studying whether advertising spend (in $000) can predict monthly sales (in $000) across 5 store locations. If the regression equation turns out to be \(\hat{y} = 6.8 + 3.4x\), then spending $3,000 on advertising predicts sales of \(6.8 + 3.4(3) = \$17,000\).
The slope measures the direction and steepness of the line.
Apex Retail — Worked Data
| Store | \(x\) (Ad $000) | \(y\) (Sales $000) | \(x_i - \bar{x}\) | \(y_i - \bar{y}\) | Product | \((x_i-\bar{x})^2\) |
|---|---|---|---|---|---|---|
| 1 | 1 | 10 | −2 | −7 | 14 | 4 |
| 2 | 2 | 14 | −1 | −3 | 3 | 1 |
| 3 | 3 | 16 | 0 | −1 | 0 | 0 |
| 4 | 4 | 22 | 1 | 5 | 5 | 1 |
| 5 | 5 | 23 | 2 | 6 | 12 | 4 |
| Mean | \(\bar{x}=3\) | \(\bar{y}=17\) | — | — | \(\sum=34\) | \(\sum=10\) |
$$b_1 = \frac{34}{10} = 3.4$$
Once you have the slope, the intercept is straightforward: it anchors the line to pass through the point \((\bar{x},\, \bar{y})\).
We found \(b_1 = 3.4\), \(\bar{x} = 3\), \(\bar{y} = 17\)
$$b_0 = 17 - 3.4 \times 3 = 17 - 10.2 = 6.8$$
Final equation: \(\hat{y} = 6.8 + 3.4x\)
What do the slope and intercept actually mean in business?
The Marketing Manager asks: "Is advertising worth the cost?"
The slope of 3.4 means every $1,000 invested in advertising returns a predicted $3,400 in sales — a 3.4× predicted return. This supports continued advertising investment, though the actual profit margin on those sales must also be considered.
Using our fitted equation \(\hat{y} = 6.8 + 3.4x\):
| Store | \(y_i\) | \(\hat{y}_i\) | Residual \(e_i\) | \(e_i^2\) | \((y_i-\bar{y})^2\) |
|---|---|---|---|---|---|
| 1 | 10 | 10.2 | −0.2 | 0.04 | 49 |
| 2 | 14 | 13.6 | 0.4 | 0.16 | 9 |
| 3 | 16 | 17.0 | −1.0 | 1.00 | 1 |
| 4 | 22 | 20.4 | 1.6 | 2.56 | 25 |
| 5 | 23 | 23.8 | −0.8 | 0.64 | 36 |
| Sum | — | — | — | \(SSE=4.40\) | \(SST=120\) |
Step-by-step:
1. \(SST = \sum(y_i - \bar{y})^2 = 120\)
2. \(SSE = \sum e_i^2 = 4.40\)
3. \(SSR = SST - SSE = 120 - 4.40 = 115.60\)
96.3% of the variation in monthly sales is explained by advertising spend. This is a very strong fit.
Checking whether the model's assumptions are met
Linear regression rests on assumptions. We check residuals to see if those assumptions hold:
Residual Plots — What to Look For
Plot residuals against \(\hat{y}\) (or \(x\)). A random scatter indicates the model fits well.
Ethical Issues in Linear Regression
Press T or Escape to close