WEEK 1 WORKSHOP

Data-Driven Exercises

Exploring statistics concepts with real data in Excel

80

Customer records

14

Variables to classify

3

Excel sheets

Open Week1_Coffee_Survey_Data.xlsx and follow along

THE SCENARIO

Melbourne Coffee Shop Survey

A new coffee shop in Melbourne's CBD has surveyed 80 customers about their habits and preferences. The data has been collected and entered into a spreadsheet.

Your job today is to explore the dataset, classify variables, compute descriptive statistics, and answer questions using Excel.

Before you start

Is this dataset a population or a sample? What would the population be?

Sheet 1: Customer Survey

80 rows of raw survey data with 14 variables

Sheet 2: Data Dictionary

Variable descriptions with columns for you to fill in data type and scale

Sheet 3: Analysis Workspace

Pre-built formulas and blank cells for your own analysis

DATASET PREVIEW

The Variables

14 columns covering demographics, behaviour, and satisfaction

VariableExample ValuesWhat It Records
Customer_IDCUST-1001, CUST-1002Unique identifier
Age24, 37, 55Customer age in years
GenderMale, Female, Non-binarySelf-reported gender
SuburbCBD, Fitzroy, St KildaWhere the customer lives
Preferred_DrinkFlat White, Latte, MochaMost ordered drink
Monthly_Visits3, 8, 15Visits per month
Items_Per_Visit1, 2, 3Items bought each visit
Avg_Spend_AUD$8.50, $14.20, $22.75Average spend per visit
Wait_Time_Min2.3, 5.1, 8.7Average wait in minutes
SatisfactionSatisfied, Very SatisfiedOverall satisfaction level
MembershipYes, NoLoyalty membership status
Distance_KM0.5, 2.3, 7.8Distance from home (km)
Payment_MethodCash, Card, Mobile PayPreferred payment type
Visit_TimeMorning, Afternoon, EveningTypical visit time of day
EXERCISE 1

Classify Every Variable

Open the "Data Dictionary" sheet and fill in the ??? columns

Your Task

  1. Go to the Data Dictionary sheet in the Excel file.
  2. For each of the 14 variables, replace the ??? with the correct classification.
  3. Data Type column: Is it Categorical or Numerical?
  4. Scale column: Is it Nominal, Ordinal, Discrete, or Continuous?
  5. Pay attention to the Notes column for hints on tricky variables.

Hint

Customer_ID looks numerical, but ask yourself: does it make sense to calculate the average Customer_ID? If the answer is no, it's probably not numerical data.

CHECK YOUR WORK

Classification Check

What type of data is Avg_Spend_AUD ($8.50, $14.20, $22.75)?
Correct! Spending is measured (not counted), making it continuous. It has a true zero point ($0 means no spending), so it's ratio scale. While currency is technically discrete at the cent level, in practice we treat dollar amounts as continuous.
Not quite. Spending is a measured value (continuous) with a true zero point (ratio). While you could argue currency is discrete at the cent level, it is commonly treated as continuous data in statistics.
CHECK YOUR WORK

Classification Check

Satisfaction has values: Very Dissatisfied, Dissatisfied, Neutral, Satisfied, Very Satisfied. What is the correct classification?
Correct! The responses are categories (not numbers), so it's categorical. There is a clear natural order from Very Dissatisfied to Very Satisfied, making it ordinal.
These are text categories with a natural ranking. That makes them categorical and ordinal. Just because something can be ranked doesn't make it numerical.
ANSWER KEY

Variable Classifications

VariableTypeSub-typeScaleKey Reasoning
Customer_IDCategorical--NominalAn identifier, not a meaningful number
AgeNumericalDiscreteRatioWhole years, true zero exists
GenderCategorical--NominalLabels with no natural order
SuburbCategorical--NominalLocation names, no order
Preferred_DrinkCategorical--NominalDrink names, no natural ranking
Monthly_VisitsNumericalDiscreteRatioCounted whole visits, true zero
Items_Per_VisitNumericalDiscreteRatioCounted whole items
Avg_Spend_AUDNumericalContinuousRatioMeasured, $0 is meaningful zero
Wait_Time_MinNumericalContinuousRatioMeasured time, true zero
SatisfactionCategorical--OrdinalOrdered categories
MembershipCategorical--NominalYes/No, no ranking
Distance_KMNumericalContinuousRatioMeasured distance, true zero
Payment_MethodCategorical--NominalPayment types, no natural order
Visit_TimeCategorical--OrdinalMorning → Afternoon → Evening has a natural order
EXERCISE 2

Descriptive Statistics in Excel

Open the "Analysis Workspace" sheet — Part 1 has formulas ready

Your Task

  1. Go to the Analysis Workspace sheet.
  2. Look at Part 1 — the formulas are already written.
  3. Study each formula and understand what it calculates.
  4. Verify the results make sense.
  5. Then try writing your own formulas in Part 3.

Key Excel Functions

COUNTA — counts non-empty cells
AVERAGE — calculates mean
MIN / MAX — smallest / largest value
COUNTIF — counts cells matching a condition
AVERAGEIF — averages cells matching a condition

Example: Counting customers

=COUNTA('Customer Survey'!A2:A81)

Result: 80 customers

Example: Average age

=AVERAGE('Customer Survey'!B2:B81)

Result: 36.70 years

Example: Count by category

=COUNTIF('Customer Survey'!C2:C81,"Male")

Result: 35 male customers

WHAT THE DATA TELLS US

Dataset Snapshot

These are the results you should see in the Analysis Workspace

80
Customers Surveyed
36.7
Average Age
$13.03
Average Spend
4.6 min
Average Wait
7.4
Avg Monthly Visits
31
Members
24
Very Satisfied
19–63
Age Range
EXERCISE 3

Write Your Own Formulas

Part 3 of the Analysis Workspace — write formulas in the yellow cells

Questions to Answer

  1. What is the average spend of members only?
  2. What is the average spend of non-members?
  3. How many customers visit in the morning?
  4. What is the average wait time for CBD customers?
  5. How many customers are under 25 years old?

New function: AVERAGEIF

Averages cells where a condition is met

=AVERAGEIF(range, "criteria", average_range)

Example: Average spend where Membership = "Yes"

=AVERAGEIF(K2:K81,"Yes",H2:H81)

New function: COUNTIF with operators

Count cells matching a comparison

=COUNTIF(B2:B81,"<25")

Counts all customers aged under 25

KNOWLEDGE CHECK

Excel Formula Check

Which formula correctly counts the number of customers who are "Satisfied" or "Very Satisfied"?
Correct! COUNTIF only accepts one criterion at a time. To count multiple categories, add separate COUNTIF calls together. The result is 34 + 24 = 58 customers.
COUNTIF only takes a single criterion. To count multiple categories, you need to add two separate COUNTIF results together: =COUNTIF(...,"Satisfied") + COUNTIF(...,"Very Satisfied").
EXERCISE 4

Think Critically

These questions don't need formulas — use the data and your thinking

Q1: Population vs Sample

The shop serves approximately 2,000 unique customers per month. We surveyed 80. What is the population? What is the sample? What is a parameter we don't know? What statistic did we calculate?

Q2: Sampling Method

The 80 customers were surveyed by approaching people in the shop on a Tuesday afternoon. What sampling method is this? What potential bias exists?

Q3: Coverage Error

This survey only captures people who physically visit the shop. The shop also does Uber Eats deliveries. Is there a coverage error? How would you fix it?

Q4: Measurement Error

The Satisfaction variable was collected by a barista asking customers face-to-face: "How happy are you with us today?" Could this introduce measurement error? What type?

Discussion

If you could redesign this survey, what sampling method would you use and what would you change about how satisfaction is collected?

EXERCISE 5

Data Investigation

Use Excel to answer these business questions

Questions

  1. Do members spend more on average than non-members? By how much?
  2. Which suburb has the most customers? (Try using COUNTIF for each suburb, or use a PivotTable.)
  3. Is the average wait time different for morning vs evening visitors?
  4. What percentage of all customers are "Very Satisfied"?
  5. What is the most common payment method among customers aged under 30? (This is a challenge — look at the data manually or try filtering.)

Helpful Formulas

Percentage:

=COUNTIF(J2:J81,"Very Satisfied")/COUNTA(J2:J81)

Try using Filters:
Click a column header, then Data → Filter. This lets you view subsets of the data without formulas.

Try a PivotTable:
Select the data, Insert → PivotTable. Drag Suburb to Rows and Customer_ID to Values to count.

KNOWLEDGE CHECK

Conceptual Check

The barista asks customers face-to-face: "How happy are you with us today?" and 72.5% report being Satisfied or Very Satisfied. This high rate is most likely influenced by:
Correct! When asked face-to-face by the person who served them, customers feel socially obligated to give a positive answer. This is the halo effect — a form of measurement error. An anonymous online survey would reduce this bias.
While other errors may exist, the most direct issue with face-to-face satisfaction questions asked by the service provider is the halo effect — customers feel pressure to be positive. This is a measurement error.
CHALLENGE

Design a Better Survey

Apply everything you've learned today

Your Brief

The coffee shop owner wants to understand why some customers visit frequently and others don't. Design a short survey (5–7 questions) that would help answer this.

  1. Write 5–7 survey questions
  2. Classify each variable you'd collect (nominal, ordinal, discrete, continuous)
  3. Define your population and sampling frame
  4. Choose a sampling method and justify it
  5. Identify at least one potential error and how you'd mitigate it

Criteria for good questions

Include at least one of each: a nominal variable, an ordinal variable, a discrete variable, and a continuous variable.

Avoid common mistakes

No leading questions, no double-barrelled questions (asking two things at once), and make sure every question has clear response options.

Discuss with your group

Compare your survey designs. Could your classmates' surveys introduce any errors you hadn't considered?

WRAP UP

What You Practised Today

Data Classification

Classified 14 real variables as nominal, ordinal, discrete, or continuous — including tricky cases like Customer_ID and Visit_Time.

Excel Skills

Used COUNTA, AVERAGE, MIN, MAX, COUNTIF, and AVERAGEIF to compute descriptive statistics from raw data.

Critical Thinking

Identified population vs sample, recognised survey errors, and designed a better survey from scratch.

Next week we'll build on these foundations with data visualisation and probability.

Make sure to complete the tutorial problems (Q1.8–1.10) before next class.