0
WEEK 11
Apex Retail Solutions

Simple Linear Regression

Modelling the relationship between two numerical variables

Can advertising spend predict monthly sales? Linear regression gives us a mathematical answer — and a decision-making tool for accountants and business analysts.

Master of Accounting — Introductory Statistics  |  Learning outcomes: coefficients, interpretation, R², residual analysis, ethics
§1
Section 1

The Regression Equation

What is linear regression and how do we build the line?

1.1

1.1 What Is Simple Linear Regression?

Definition: A statistical method that models the straight-line relationship between one independent variable \(x\) and one dependent variable \(y\).

We call it simple because there is exactly one predictor variable.

Terminology

  • Dependent variable \(y\): what we want to predict (e.g. monthly sales)
  • Independent variable \(x\): what we use to predict (e.g. ad spend)
  • Regression line: the best-fitting straight line through the data
Ad Spend ($000) Sales ($000) 1 2 3 4 5 ŷ = b₀ + b₁x
A scatter plot with the regression line fitted through the data.
1.2

1.2 The Regression Equation

Simple Linear Regression Model
$$\hat{y} = b_0 + b_1 x$$
\(\hat{y}\)
Predicted value of the dependent variable
\(b_0\) — Intercept
Predicted value of \(y\) when \(x = 0\)
\(b_1\) — Slope
Change in \(\hat{y}\) for each 1-unit increase in \(x\)
Scenario — Apex Retail Solutions

An analyst is studying whether advertising spend (in $000) can predict monthly sales (in $000) across 5 store locations. If the regression equation turns out to be \(\hat{y} = 6.8 + 3.4x\), then spending $3,000 on advertising predicts sales of \(6.8 + 3.4(3) = \$17,000\).

Note: The hat symbol \(\hat{y}\) (pronounced "y-hat") reminds us this is a predicted value, not an actual observed value.
1.3

1.3 Calculating the Slope \(b_1\)

Slope Formula
$$b_1 = \frac{\sum(x_i - \bar{x})(y_i - \bar{y})}{\sum(x_i - \bar{x})^2}$$

The slope measures the direction and steepness of the line.

  • \(b_1 > 0\): as \(x\) increases, \(y\) increases
  • \(b_1 < 0\): as \(x\) increases, \(y\) decreases
  • \(b_1 = 0\): no linear relationship

Apex Retail — Worked Data

Store\(x\) (Ad $000)\(y\) (Sales $000)\(x_i - \bar{x}\)\(y_i - \bar{y}\)Product\((x_i-\bar{x})^2\)
1110−2−7144
2214−1−331
33160−100
44221551
552326124
Mean\(\bar{x}=3\)\(\bar{y}=17\)\(\sum=34\)\(\sum=10\)

$$b_1 = \frac{34}{10} = 3.4$$

1.4

1.4 Calculating the Intercept \(b_0\)

Intercept Formula
$$b_0 = \bar{y} - b_1 \bar{x}$$

Once you have the slope, the intercept is straightforward: it anchors the line to pass through the point \((\bar{x},\, \bar{y})\).

Apex Retail Calculation

We found \(b_1 = 3.4\), \(\bar{x} = 3\), \(\bar{y} = 17\)

$$b_0 = 17 - 3.4 \times 3 = 17 - 10.2 = 6.8$$

Final equation: \(\hat{y} = 6.8 + 3.4x\)

x (Ad Spend $000) y (Sales $000) 6.8 1 2 3 4 5 b₀ = 6.8 (intercept) (x̄, ȳ) = (3, 17)
The regression line always passes through the point \((\bar{x}, \bar{y})\).
1.Q

Knowledge Check — Section 1

Q1. For the Apex Retail data, the regression equation is \(\hat{y} = 6.8 + 3.4x\). If a store spends $4,000 on advertising, what are predicted monthly sales?
Substitute \(x = 4\): \(\hat{y} = 6.8 + 3.4(4) = 6.8 + 13.6 = \$20,400\). Remember: \(x\) is in thousands, so \(x = 4\) means $4,000 spend.

Q2. The slope \(b_1\) is calculated as \(\sum(x_i - \bar{x})(y_i - \bar{y})\) divided by which of the following?
The slope formula is \(b_1 = \frac{\sum(x_i-\bar{x})(y_i-\bar{y})}{\sum(x_i-\bar{x})^2}\). The denominator measures the spread in \(x\) alone.
§2
Section 2

Interpreting the Model

What do the slope and intercept actually mean in business?

2.1

2.1 Interpreting Slope and Intercept

Interpreting \(b_1 = 3.4\):
For every additional $1,000 spent on advertising, monthly sales are predicted to increase by $3,400.
Interpreting \(b_0 = 6.8\):
When advertising spend is $0, predicted monthly sales are $6,800. This represents the baseline sales without advertising.
Caution on the intercept: Extrapolating to \(x = 0\) is only meaningful if it is realistic. A negative intercept might simply indicate the model should not be used at very low values of \(x\).
Decision Brief — Marketing Manager

The Marketing Manager asks: "Is advertising worth the cost?"

The slope of 3.4 means every $1,000 invested in advertising returns a predicted $3,400 in sales — a 3.4× predicted return. This supports continued advertising investment, though the actual profit margin on those sales must also be considered.

Always state the units when interpreting. "For every one unit increase in x, y increases by b₁ units" is incomplete without naming what x and y represent.
2.2

2.2 The Coefficient of Determination — \(R^2\)

\(R^2\) measures how much of the variation in \(y\) is explained by the regression model. It ranges from 0 to 1. $$R^2 = \frac{SSR}{SST} = 1 - \frac{SSE}{SST}$$
  • SST (Total): total variation in \(y\)
  • SSR (Regression): variation explained by \(x\)
  • SSE (Error): unexplained variation (residuals)
Interpretation: \(R^2 = 0.96\) means 96% of the variation in monthly sales is explained by advertising spend.
SST Total Variation SSR Explained SSE = SSR + SSE R² = SSR / SST =
SST is partitioned into explained (SSR) and unexplained (SSE) variation.
2.3

2.3 Calculating \(R^2\) — Apex Retail

Using our fitted equation \(\hat{y} = 6.8 + 3.4x\):

Store\(y_i\)\(\hat{y}_i\)Residual \(e_i\)\(e_i^2\)\((y_i-\bar{y})^2\)
11010.2−0.20.0449
21413.60.40.169
31617.0−1.01.001
42220.41.62.5625
52323.8−0.80.6436
Sum\(SSE=4.40\)\(SST=120\)

Step-by-step:

1. \(SST = \sum(y_i - \bar{y})^2 = 120\)

2. \(SSE = \sum e_i^2 = 4.40\)

3. \(SSR = SST - SSE = 120 - 4.40 = 115.60\)

$$R^2 = \frac{115.60}{120} = 0.963$$
Interpretation

96.3% of the variation in monthly sales is explained by advertising spend. This is a very strong fit.

2.Q

Knowledge Check — Section 2

Q3. A regression model produces SST = 200 and SSE = 30. What is \(R^2\)?
\(SSR = SST - SSE = 200 - 30 = 170\). Then \(R^2 = 170/200 = 0.85\). This means 85% of variation in \(y\) is explained by the model.

Q4. An analyst says: "R² = 0.04, so our model explains very little of the variation in costs." Is this a good or poor model?
R² = 0.04 means only 4% of variation is explained — a very weak model. We should either find a better predictor or add more variables (multiple regression). Low R² does not mean "good generalisation" in this context.
§3
Section 3

Residual Analysis

Checking whether the model's assumptions are met

3.1

3.1 What Is a Residual?

A residual is the difference between the actual observed value and the predicted value: $$e_i = y_i - \hat{y}_i$$ It represents what the model could not explain.

Why examine residuals?

Linear regression rests on assumptions. We check residuals to see if those assumptions hold:

  • Linearity: residuals should show no curved pattern
  • Equal variance: spread of residuals should be constant (homoscedasticity)
  • Normality: residuals should be roughly normally distributed
  • Independence: residuals should not be correlated with each other
x (Ad Spend) y (Sales) e₄ = +1.6 e₃ = −1.0 Positive residual = above line
Residuals are vertical distances from data points to the regression line.
3.2 / 4.1

3.2 Residual Plots & 4.1 Ethical Issues

Residual Plots — What to Look For

GOOD: Random scatter BAD: Curved pattern Linearity holds Non-linear → model invalid

Plot residuals against \(\hat{y}\) (or \(x\)). A random scatter indicates the model fits well.

Ethical Issues in Linear Regression

Correlation ≠ Causation: A high R² does not mean \(x\) causes \(y\). Ice cream sales and drowning rates are correlated — but the cause is summer heat.
Extrapolation: Using the model to predict far outside the observed range of \(x\) is unreliable and potentially misleading.
Omitted variables: Attributing outcomes to one variable while ignoring others (e.g., claiming advertising alone drives sales, ignoring seasonal effects) can lead to flawed decisions.
Always ask: Does the relationship make logical business sense? Are there confounding variables? Who might be harmed by incorrect predictions?

Table of Contents — Week 11

Press T or Escape to close