Use ← → arrow keys or the buttons below to navigate
PhD in Recommender Systems
Queensland University of Technology (QUT)
AI Specialist at Telstra
4 years of university teaching across:
QUT · Kaplan Business School · Central Queensland University
Available during consultation hours and via email
"Should we launch in Melbourne or Sydney first?" Statistics turns gut feelings into evidence-based answers.
Netflix uses statistics to predict what you'll binge next. Spotify to build your Discover Weekly. It's everywhere.
Data literacy is the #1 skill employers seek across marketing, finance, consulting, operations — every field.
Amazon uses A/B testing (a statistical method) to decide button colours, page layouts, and pricing — generating billions in extra revenue from tiny data-driven tweaks.
COVID vaccine trials relied on inferential statistics to determine efficacy from a sample of thousands, then applied findings to billions of people worldwide.
Uber's surge pricing model uses real-time statistical analysis of supply and demand patterns to set prices every few minutes across every city.
Banks use statistical models to calculate credit risk, decide loan approvals, and detect fraudulent transactions — all in milliseconds.
This unit gives you the foundation to understand, critique, and apply these techniques in your own career.
Define what statistics is and understand its relationship to business strategy
Learn key terminology: population, sample, parameter, statistic
Understand data collection methods and their trade-offs
Apply survey sampling methods: simple random, systematic, stratified, cluster
Classify measurement scales: nominal, ordinal, interval, ratio
Recognise common survey errors and how to avoid them
Long-term goals
Gather evidence
Process & analyse
Actionable insight
Used when data is huge (e.g. millions of supermarket transactions). Extracts patterns from big data.
Used when data is scarce (e.g. crash test results). Makes inferences from limited samples.
Statistics is the science of planning, collecting, analysing, and interpreting data to make informed decisions.
Collecting, summarising, and describing data. Also called Exploratory Data Analysis (EDA).
Example: "The average customer rating is 4.2 out of 5."
Using sample data to estimate characteristics and make inferences about the whole population.
Example: "Based on our sample, we're 95% confident the true average is between 4.0–4.4."
The complete set of all possible elements.
All 50,000 Qantas passengers last month
A selected portion of the population.
500 passengers surveyed
A characteristic of the population.
True average satisfaction of all 50,000
A characteristic of the sample (used to estimate the parameter).
Average satisfaction of the 500 surveyed
An individual object being measured.
e.g. A registered voter in Victoria
Non-overlapping collections of elements.
e.g. Households (instead of individual voters)
A list of all sampling units available to sample from.
e.g. The electoral roll
The units actually drawn from the frame.
e.g. 1,000 voters selected from the roll
The frame may be smaller than the population — some elements might not be observable. This creates coverage error, which we'll discuss later.
You collect and analyse the data yourself. You control the process.
Running your own customer survey
Someone else collected the data and made it available for your use.
Australian Bureau of Statistics data
| Method | Cost | Response Rate | Key Strength | Key Risk |
|---|---|---|---|---|
| Experiment | High | — | Controlled conditions, causal conclusions | Artificial setting may not reflect reality |
| Personal Interview | High | High | Can observe body language, clarify questions | Interviewer bias, costly training needed |
| Telephone Interview | Medium | Medium | Cheaper than face-to-face | Not everyone has a phone; high annoyance |
| Questionnaire | Low | Low | Cheap, scalable, web-friendly | Attracts extreme opinions, ambiguous Qs |
| Direct Observation | Medium | — | Objective, real-time data | Limited to observable behaviours |
| Focus Group | Medium | High | Rich qualitative insight, open-ended | Groupthink, moderator influence |
You want to measure customer satisfaction at a new restaurant. Which method would you choose and why?
Items chosen without known probability of selection. Convenient but prone to bias.
e.g. Asking your friends, street intercepts
Items chosen with known probability. Allows valid statistical inference.
We'll learn 4 key methods →
Every item in the frame has an equal chance of being selected.
Highlighted = randomly selected | Items chosen using random number tables, Excel RAND(), or software
With replacement: Items returned to frame after selection (can appear twice)
Without replacement: Items removed after selection (more common in practice)
Pick a random start, then select every kth item. Here k = N/n.
Start at item 3, then every 10th → items 3, 13, 23, 33, 43, 53, 63, 73, 83, 93
Example: N = 200 customers, n = 20 needed → k = 200/20 = 10. Random start at item 7, then pick items 7, 17, 27, 37…
Divide the frame into strata (groups with similar characteristics), then randomly sample from each.
Key idea: Ensures every subgroup is represented proportionally. If 20% of cameras sold are Pentax, then 20% of the sample should be Pentax.
Divide the frame into clusters (each representative of the population). Randomly select clusters, then study everyone in those clusters.
Key difference from stratified: In stratified, you sample within every stratum. In cluster, you select entire clusters and skip others.
| Method | How It Works | Best When… | Watch Out For… |
|---|---|---|---|
| Simple Random | Every item has equal chance | Frame is available and not too large | Can be impractical for very large populations |
| Systematic | Every kth item after random start | Frame is ordered (e.g. customer list) | Hidden periodicity in the list could cause bias |
| Stratified | Random sample within each subgroup | Population has distinct subgroups you want represented | Need to know the strata in advance |
| Cluster | Randomly select whole clusters | Population is geographically spread out | Clusters may not be truly representative |
Responses are categories or labels.
Yes/No, Male/Female, Telstra/Optus/Vodafone
Responses are numbers with quantitative meaning.
Height 1.7m, Weight 72kg, Time 5hr
Discrete: Counting → whole numbers. Number of students in a class.
Continuous: Measuring → any value. Speed of a car, your weight.
Is the Richter scale (earthquake magnitude) ratio data? No! A 6 is 10× stronger than 5, but 7 is 100× (not 20×) stronger than 5. Ratios aren't proportional → it's interval data.
| Scale | Categories? | Order? | Equal Spacing? | True Zero? | Example |
|---|---|---|---|---|---|
| Nominal | Yes | No | No | No | Airline name, Gender, ABN |
| Ordinal | Yes | Yes | No | No | S&P ratings, Survey ranking, Socioeconomic class |
| Interval | Yes | Yes | Yes | No | Temperature (°C), Richter scale, Children's clothing size |
| Ratio | Yes | Yes | Yes | Yes | Height, Weight, Income, Time, ASX 200 index |
Is currency discrete or continuous? Strictly discrete ($1.50 or $1.51, nothing between). But petrol at 149.9¢ shows fractions of cents — so currency is often treated as continuous in practice.
| Variable | Type | Sub-type | Scale | Reasoning |
|---|---|---|---|---|
| Phones per household | Numerical | Discrete | Ratio | Counting whole phones; true zero |
| Service provider | Categorical | — | Nominal | Names of companies; no natural order |
| Texts sent per month | Numerical | Discrete | Ratio | Counting messages; true zero |
| Longest call (minutes) | Numerical | Continuous | Ratio | Measuring time; true zero |
| Phone colour | Categorical | — | Nominal | Colour names; no natural order |
| Monthly charge ($) | Numerical | Discrete | Ratio | Money to nearest cent; true zero |
| Owns car charge kit? | Categorical | — | Nominal | Yes/No; no order |
| Calls per month | Numerical | Discrete | Ratio | Counting calls; true zero |
| Satisfaction level | Categorical | — | Ordinal | Ordered categories (very satisfied → very dissatisfied) |
Population: All customers who bought a digital camera in the past 12 months.
Frame: Customers who returned their warranty card — likely smaller and potentially biased.
To compare brands → use stratified sampling with brands as strata. Ensure each brand's proportion in the sample matches its proportion in sales.
What is your gender? What brand did you buy? How satisfied are you? (Likert scale 1–5)
What price did you pay? How many months ago? How many times have you had it repaired?
Some groups excluded from the frame → selection bias
Chance differences between possible samples → margin of error
Not everyone responds → non-response bias
Ambiguous wording, halo effect, leading questions
It's the tool that turns limited data into actionable business intelligence. Descriptive summarises; inferential predicts.
Population → Sample. Parameter → Statistic. Frame may ≠ Population. These distinctions matter in every analysis.
Simple random, systematic, stratified, cluster — each has strengths. Choose based on your population structure and resources.
Nominal → Ordinal → Interval → Ratio. The scale determines which statistical methods you can use on the data.
Black et al. 2019 — Chapters 1 & 7 (sections in Study Guide)
Complete recommended problems from the Study Guide (Questions 1.8–1.10)
Get comfortable with basics — try the RAND() function for random sampling
Think about the discussion points — come ready to participate!
See you in the tutorial! — Stephen