Customer records
Variables to classify
Excel sheets
Open Week1_Coffee_Survey_Data.xlsx and follow along
A new coffee shop in Melbourne's CBD has surveyed 80 customers about their habits and preferences. The data has been collected and entered into a spreadsheet.
Your job today is to explore the dataset, classify variables, compute descriptive statistics, and answer questions using Excel.
Is this dataset a population or a sample? What would the population be?
80 rows of raw survey data with 14 variables
Variable descriptions with columns for you to fill in data type and scale
Pre-built formulas and blank cells for your own analysis
| Variable | Example Values | What It Records |
|---|---|---|
| Customer_ID | CUST-1001, CUST-1002 | Unique identifier |
| Age | 24, 37, 55 | Customer age in years |
| Gender | Male, Female, Non-binary | Self-reported gender |
| Suburb | CBD, Fitzroy, St Kilda | Where the customer lives |
| Preferred_Drink | Flat White, Latte, Mocha | Most ordered drink |
| Monthly_Visits | 3, 8, 15 | Visits per month |
| Items_Per_Visit | 1, 2, 3 | Items bought each visit |
| Avg_Spend_AUD | $8.50, $14.20, $22.75 | Average spend per visit |
| Wait_Time_Min | 2.3, 5.1, 8.7 | Average wait in minutes |
| Satisfaction | Satisfied, Very Satisfied | Overall satisfaction level |
| Membership | Yes, No | Loyalty membership status |
| Distance_KM | 0.5, 2.3, 7.8 | Distance from home (km) |
| Payment_Method | Cash, Card, Mobile Pay | Preferred payment type |
| Visit_Time | Morning, Afternoon, Evening | Typical visit time of day |
Customer_ID looks numerical, but ask yourself: does it make sense to calculate the average Customer_ID? If the answer is no, it's probably not numerical data.
| Variable | Type | Sub-type | Scale | Key Reasoning |
|---|---|---|---|---|
| Customer_ID | Categorical | -- | Nominal | An identifier, not a meaningful number |
| Age | Numerical | Discrete | Ratio | Whole years, true zero exists |
| Gender | Categorical | -- | Nominal | Labels with no natural order |
| Suburb | Categorical | -- | Nominal | Location names, no order |
| Preferred_Drink | Categorical | -- | Nominal | Drink names, no natural ranking |
| Monthly_Visits | Numerical | Discrete | Ratio | Counted whole visits, true zero |
| Items_Per_Visit | Numerical | Discrete | Ratio | Counted whole items |
| Avg_Spend_AUD | Numerical | Continuous | Ratio | Measured, $0 is meaningful zero |
| Wait_Time_Min | Numerical | Continuous | Ratio | Measured time, true zero |
| Satisfaction | Categorical | -- | Ordinal | Ordered categories |
| Membership | Categorical | -- | Nominal | Yes/No, no ranking |
| Distance_KM | Numerical | Continuous | Ratio | Measured distance, true zero |
| Payment_Method | Categorical | -- | Nominal | Payment types, no natural order |
| Visit_Time | Categorical | -- | Ordinal | Morning → Afternoon → Evening has a natural order |
COUNTA — counts non-empty cells
AVERAGE — calculates mean
MIN / MAX — smallest / largest value
COUNTIF — counts cells matching a condition
AVERAGEIF — averages cells matching a condition
Result: 80 customers
Result: 36.70 years
Result: 35 male customers
Averages cells where a condition is met
Example: Average spend where Membership = "Yes"
Count cells matching a comparison
Counts all customers aged under 25
The shop serves approximately 2,000 unique customers per month. We surveyed 80. What is the population? What is the sample? What is a parameter we don't know? What statistic did we calculate?
The 80 customers were surveyed by approaching people in the shop on a Tuesday afternoon. What sampling method is this? What potential bias exists?
This survey only captures people who physically visit the shop. The shop also does Uber Eats deliveries. Is there a coverage error? How would you fix it?
The Satisfaction variable was collected by a barista asking customers face-to-face: "How happy are you with us today?" Could this introduce measurement error? What type?
If you could redesign this survey, what sampling method would you use and what would you change about how satisfaction is collected?
Percentage:
Try using Filters:
Click a column header, then Data → Filter. This lets you view subsets of the data without formulas.
Try a PivotTable:
Select the data, Insert → PivotTable. Drag Suburb to Rows and Customer_ID to Values to count.
The coffee shop owner wants to understand why some customers visit frequently and others don't. Design a short survey (5–7 questions) that would help answer this.
Include at least one of each: a nominal variable, an ordinal variable, a discrete variable, and a continuous variable.
No leading questions, no double-barrelled questions (asking two things at once), and make sure every question has clear response options.
Compare your survey designs. Could your classmates' surveys introduce any errors you hadn't considered?
Classified 14 real variables as nominal, ordinal, discrete, or continuous — including tricky cases like Customer_ID and Visit_Time.
Used COUNTA, AVERAGE, MIN, MAX, COUNTIF, and AVERAGEIF to compute descriptive statistics from raw data.
Identified population vs sample, recognised survey errors, and designed a better survey from scratch.
Make sure to complete the tutorial problems (Q1.8–1.10) before next class.