Self-directed activity — work through each part at your own pace using the dataset below.
Apex Retail Solutions is an Australian retail analytics firm. The marketing team wants to know whether advertising expenditure can reliably predict monthly sales revenue. You are the analyst. Using data from 5 store locations, you will build, interpret, and evaluate a simple linear regression model — entirely in Excel.
Open a blank Excel workbook. Enter the data below starting from cell A1.
| Cell | A — Store | B — Ad Spend ($000) | C — Monthly Sales ($000) |
|---|---|---|---|
| Row 1 | Store | Ad_Spend | Monthly_Sales |
| Row 2 | 1 | 1 | 10 |
| Row 3 | 2 | 2 | 14 |
| Row 4 | 3 | 3 | 16 |
| Row 5 | 4 | 4 | 22 |
| Row 6 | 5 | 5 | 23 |
Select the range B1:C6 (Ad Spend and Monthly Sales, including headers).
Go to Insert → Charts → Scatter and choose "Scatter (X,Y)" — the plain dots with no lines.
Add axis titles: right-click the chart → Add Chart Element → Axis Titles.
Add a linear trendline: right-click any data point → Add Trendline → select Linear → tick "Display Equation on chart" and "Display R-squared value on chart".
Looking at the scatter plot, does the relationship between advertising spend and monthly sales appear linear? Describe what you see.
What equation does Excel display on the trendline? Write it here.
Excel has built-in functions to directly calculate the regression slope and intercept. Use these to verify your trendline equation.
In cell E1, type the label: Slope (b1)
In cell F1, enter the formula:
Syntax: =SLOPE(known_ys, known_xs)
In cell E2, type the label: Intercept (b0)
In cell F2, enter:
What values does Excel return for the slope and intercept?
Write the full regression equation and interpret the slope in one sentence using the actual variable names (advertising spend, monthly sales).
Residuals tell us how far each actual observation is from the regression line.
In cell D1, type the header: Predicted_Sales
In cell D2, enter a formula using your slope (F1) and intercept (F2):
Then copy this formula down to D3:D6 by dragging the fill handle.
In cell E3, type the header: Residual (in cell E1 if free, else use column F or G).
Calculate the residual for each store as: Actual − Predicted
Copy down to all 5 rows.
Complete the table. What are the predicted sales and residuals for each store?
What is the sum of all residuals? Use =SUM(...) to check. What do you notice, and why does this make mathematical sense?
In a new area (e.g., cells E5:F8), build the following summary table:
| Cell | Label | Formula |
|---|---|---|
| E5 | SST | =DEVSQ(C2:C6) |
| E6 | SSE | =SUMXMY2(C2:C6,D2:D6) |
| E7 | SSR | =E5-E6 |
| E8 | R² | =E7/E5 |
Alternatively, use Excel's built-in R² function to double-check:
Syntax: =RSQ(known_ys, known_xs)
What are the values of SST, SSE, SSR, and R²? Confirm that SST = SSR + SSE.
You are presenting results to the CFO. Write a one-sentence business interpretation of R² = 0.963.
A residual plot checks whether our linear model is appropriate.
Select the Ad Spend values (B2:B6) and the Residuals column together (hold Ctrl to select non-adjacent columns). Insert a Scatter chart.
Add a horizontal reference line at zero: right-click the chart → Add Chart Element → Lines → No lines. Alternatively, draw a manual horizontal line using Insert → Shapes.
The zero line represents "perfect prediction." Points above it have positive residuals; points below have negative residuals.
Label axes: x-axis = "Advertising Spend ($000)", y-axis = "Residual ($000)"
Describe what you see in the residual plot. Is there a pattern, or do the residuals appear randomly scattered around zero?
Which store has the largest residual (in absolute value)? What does this tell us about that store's actual sales compared to the model's prediction?
In cell H1, type: New Ad Spend. In I1: Predicted Sales.
In H2, enter a value (e.g., 3.5 for $3,500 spend).
In I2, enter:
What does the model predict for monthly sales if a store spends $3,500 on advertising?
Now try entering x = 20 ($20,000 ad spend). What does the model predict? Is it safe to use this prediction? Explain.
Management is considering using this model to set advertising budgets for all 50 stores across Australia. List two ethical concerns a prudent analyst should raise before this decision is made.
| Function | Purpose | Syntax |
|---|---|---|
| =SLOPE() | Calculates regression slope b₁ | =SLOPE(known_ys, known_xs) |
| =INTERCEPT() | Calculates regression intercept b₀ | =INTERCEPT(known_ys, known_xs) |
| =RSQ() | Calculates R² | =RSQ(known_ys, known_xs) |
| =DEVSQ() | Calculates SST = Σ(y − ȳ)² | =DEVSQ(range) |
| =SUMXMY2() | Calculates SSE = Σ(actual − predicted)² | =SUMXMY2(range1, range2) |
| =FORECAST.LINEAR() | Predicts ŷ for a new x value | =FORECAST.LINEAR(x, known_ys, known_xs) |