WEEK 1 WORKSHOP

Data-Driven Exercises

Exploring statistics concepts with real data in Excel

80

Customer records

14

Variables to classify

3

Excel sheets

Open Week1_Coffee_Survey_Data.xlsx and follow along

THE SCENARIO

Melbourne Coffee Shop Survey

A new coffee shop in Melbourne's CBD has surveyed 80 customers about their habits and preferences. The data has been collected and entered into a spreadsheet.

Your job today is to explore the dataset, classify variables, compute descriptive statistics, and answer questions using Excel.

Before you start

Is this dataset a population or a sample? What would the population be?

Sheet 1: Customer Survey

80 rows of raw survey data with 14 variables

Sheet 2: Data Dictionary

Variable descriptions with columns for you to fill in data type and scale

Sheet 3: Analysis Workspace

Pre-built formulas and blank cells for your own analysis

DATASET PREVIEW

The Variables

14 columns covering demographics, behaviour, and satisfaction

Variable	Example Values	What It Records
Customer_ID	CUST-1001, CUST-1002	Unique identifier
Age	24, 37, 55	Customer age in years
Gender	Male, Female, Non-binary	Self-reported gender
Suburb	CBD, Fitzroy, St Kilda	Where the customer lives
Preferred_Drink	Flat White, Latte, Mocha	Most ordered drink
Monthly_Visits	3, 8, 15	Visits per month
Items_Per_Visit	1, 2, 3	Items bought each visit
Avg_Spend_AUD	$8.50, $14.20, $22.75	Average spend per visit
Wait_Time_Min	2.3, 5.1, 8.7	Average wait in minutes
Satisfaction	Satisfied, Very Satisfied	Overall satisfaction level
Membership	Yes, No	Loyalty membership status
Distance_KM	0.5, 2.3, 7.8	Distance from home (km)
Payment_Method	Cash, Card, Mobile Pay	Preferred payment type
Visit_Time	Morning, Afternoon, Evening	Typical visit time of day

EXERCISE 1

Classify Every Variable

Open the "Data Dictionary" sheet and fill in the ??? columns

Your Task

Go to the Data Dictionary sheet in the Excel file.
For each of the 14 variables, replace the ??? with the correct classification.
Data Type column: Is it Categorical or Numerical?
Scale column: Is it Nominal, Ordinal, Discrete, or Continuous?
Pay attention to the Notes column for hints on tricky variables.

Hint

Customer_ID looks numerical, but ask yourself: does it make sense to calculate the average Customer_ID? If the answer is no, it's probably not numerical data.

CHECK YOUR WORK

Classification Check

What type of data is Avg_Spend_AUD ($8.50, $14.20, $22.75)?

Correct! Spending is measured (not counted), making it continuous. It has a true zero point ($0 means no spending), so it's ratio scale. While currency is technically discrete at the cent level, in practice we treat dollar amounts as continuous.

Not quite. Spending is a measured value (continuous) with a true zero point (ratio). While you could argue currency is discrete at the cent level, it is commonly treated as continuous data in statistics.

CHECK YOUR WORK

Classification Check

Satisfaction has values: Very Dissatisfied, Dissatisfied, Neutral, Satisfied, Very Satisfied. What is the correct classification?

Correct! The responses are categories (not numbers), so it's categorical. There is a clear natural order from Very Dissatisfied to Very Satisfied, making it ordinal.

These are text categories with a natural ranking. That makes them categorical and ordinal. Just because something can be ranked doesn't make it numerical.

ANSWER KEY

Variable Classifications

Variable	Type	Sub-type	Scale	Key Reasoning
Customer_ID	Categorical	--	Nominal	An identifier, not a meaningful number
Age	Numerical	Discrete	Ratio	Whole years, true zero exists
Gender	Categorical	--	Nominal	Labels with no natural order
Suburb	Categorical	--	Nominal	Location names, no order
Preferred_Drink	Categorical	--	Nominal	Drink names, no natural ranking
Monthly_Visits	Numerical	Discrete	Ratio	Counted whole visits, true zero
Items_Per_Visit	Numerical	Discrete	Ratio	Counted whole items
Avg_Spend_AUD	Numerical	Continuous	Ratio	Measured, $0 is meaningful zero
Wait_Time_Min	Numerical	Continuous	Ratio	Measured time, true zero
Satisfaction	Categorical	--	Ordinal	Ordered categories
Membership	Categorical	--	Nominal	Yes/No, no ranking
Distance_KM	Numerical	Continuous	Ratio	Measured distance, true zero
Payment_Method	Categorical	--	Nominal	Payment types, no natural order
Visit_Time	Categorical	--	Ordinal	Morning → Afternoon → Evening has a natural order

EXERCISE 2

Descriptive Statistics in Excel

Open the "Analysis Workspace" sheet — Part 1 has formulas ready

Your Task

Go to the Analysis Workspace sheet.
Look at Part 1 — the formulas are already written.
Study each formula and understand what it calculates.
Verify the results make sense.
Then try writing your own formulas in Part 3.

Key Excel Functions

COUNTA — counts non-empty cells
AVERAGE — calculates mean
MIN / MAX — smallest / largest value
COUNTIF — counts cells matching a condition
AVERAGEIF — averages cells matching a condition

Example: Counting customers

=COUNTA('Customer Survey'!A2:A81)

Result: 80 customers

Example: Average age

=AVERAGE('Customer Survey'!B2:B81)

Result: 36.70 years

Example: Count by category

=COUNTIF('Customer Survey'!C2:C81,"Male")

Result: 35 male customers

WHAT THE DATA TELLS US

Dataset Snapshot

These are the results you should see in the Analysis Workspace

80

Customers Surveyed

36.7

Average Age

$13.03

Average Spend

4.6 min

Average Wait

7.4

Avg Monthly Visits

31

Members

24

Very Satisfied

19–63

Age Range

EXERCISE 3

Write Your Own Formulas

Part 3 of the Analysis Workspace — write formulas in the yellow cells

Questions to Answer

What is the average spend of members only?
What is the average spend of non-members?
How many customers visit in the morning?
What is the average wait time for CBD customers?
How many customers are under 25 years old?

New function: AVERAGEIF

Averages cells where a condition is met

=AVERAGEIF(range, "criteria", average_range)

Example: Average spend where Membership = "Yes"

=AVERAGEIF(K2:K81,"Yes",H2:H81)

New function: COUNTIF with operators

Count cells matching a comparison

=COUNTIF(B2:B81,"<25")

Counts all customers aged under 25

KNOWLEDGE CHECK

Excel Formula Check

Which formula correctly counts the number of customers who are "Satisfied" or "Very Satisfied"?

Correct! COUNTIF only accepts one criterion at a time. To count multiple categories, add separate COUNTIF calls together. The result is 34 + 24 = 58 customers.

COUNTIF only takes a single criterion. To count multiple categories, you need to add two separate COUNTIF results together: =COUNTIF(...,"Satisfied") + COUNTIF(...,"Very Satisfied").

EXERCISE 4

Think Critically

These questions don't need formulas — use the data and your thinking

Q1: Population vs Sample

The shop serves approximately 2,000 unique customers per month. We surveyed 80. What is the population? What is the sample? What is a parameter we don't know? What statistic did we calculate?

Q2: Sampling Method

The 80 customers were surveyed by approaching people in the shop on a Tuesday afternoon. What sampling method is this? What potential bias exists?

Q3: Coverage Error

This survey only captures people who physically visit the shop. The shop also does Uber Eats deliveries. Is there a coverage error? How would you fix it?

Q4: Measurement Error

The Satisfaction variable was collected by a barista asking customers face-to-face: "How happy are you with us today?" Could this introduce measurement error? What type?

Discussion

If you could redesign this survey, what sampling method would you use and what would you change about how satisfaction is collected?

EXERCISE 5

Data Investigation

Use Excel to answer these business questions

Questions

Do members spend more on average than non-members? By how much?
Which suburb has the most customers? (Try using COUNTIF for each suburb, or use a PivotTable.)
Is the average wait time different for morning vs evening visitors?
What percentage of all customers are "Very Satisfied"?
What is the most common payment method among customers aged under 30? (This is a challenge — look at the data manually or try filtering.)

Helpful Formulas

Percentage:

=COUNTIF(J2:J81,"Very Satisfied")/COUNTA(J2:J81)

Try using Filters:
Click a column header, then Data → Filter. This lets you view subsets of the data without formulas.

Try a PivotTable:
Select the data, Insert → PivotTable. Drag Suburb to Rows and Customer_ID to Values to count.

KNOWLEDGE CHECK

Conceptual Check

The barista asks customers face-to-face: "How happy are you with us today?" and 72.5% report being Satisfied or Very Satisfied. This high rate is most likely influenced by:

Correct! When asked face-to-face by the person who served them, customers feel socially obligated to give a positive answer. This is the halo effect — a form of measurement error. An anonymous online survey would reduce this bias.

While other errors may exist, the most direct issue with face-to-face satisfaction questions asked by the service provider is the halo effect — customers feel pressure to be positive. This is a measurement error.

CHALLENGE

Design a Better Survey

Apply everything you've learned today

Your Brief

The coffee shop owner wants to understand why some customers visit frequently and others don't. Design a short survey (5–7 questions) that would help answer this.

Write 5–7 survey questions
Classify each variable you'd collect (nominal, ordinal, discrete, continuous)
Define your population and sampling frame
Choose a sampling method and justify it
Identify at least one potential error and how you'd mitigate it

Criteria for good questions

Include at least one of each: a nominal variable, an ordinal variable, a discrete variable, and a continuous variable.

Avoid common mistakes

No leading questions, no double-barrelled questions (asking two things at once), and make sure every question has clear response options.

Discuss with your group

Compare your survey designs. Could your classmates' surveys introduce any errors you hadn't considered?

WRAP UP

What You Practised Today

Data Classification

Classified 14 real variables as nominal, ordinal, discrete, or continuous — including tricky cases like Customer_ID and Visit_Time.

Excel Skills

Used COUNTA, AVERAGE, MIN, MAX, COUNTIF, and AVERAGEIF to compute descriptive statistics from raw data.

Critical Thinking

Identified population vs sample, recognised survey errors, and designed a better survey from scratch.

Next week we'll build on these foundations with data visualisation and probability.

Make sure to complete the tutorial problems (Q1.8–1.10) before next class.