WEEK 1

Introduction to Statistics

Foundations for Data-Driven Decision Making

Use ← → arrow keys or the buttons below to navigate

YOUR INSTRUCTOR

SV

Stephen Vu

Instructor — Introduction to Statistics

Education

PhD in Recommender Systems
Queensland University of Technology (QUT)

Industry Experience

AI Specialist at Telstra

Teaching Experience

4 years of university teaching across:
QUT · Kaplan Business School · Central Queensland University

Contact

Available during consultation hours and via email

THE BIG QUESTION

Why study statistics?

Because every business decision you'll ever make depends on data.

Make Better Decisions

"Should we launch in Melbourne or Sydney first?" Statistics turns gut feelings into evidence-based answers.

Detect What's Hidden

Netflix uses statistics to predict what you'll binge next. Spotify to build your Discover Weekly. It's everywhere.

Career Advantage

Data literacy is the #1 skill employers seek across marketing, finance, consulting, operations — every field.

REAL-WORLD IMPACT

Statistics is already shaping your life

Retail & E-Commerce

Amazon uses A/B testing (a statistical method) to decide button colours, page layouts, and pricing — generating billions in extra revenue from tiny data-driven tweaks.

Healthcare

COVID vaccine trials relied on inferential statistics to determine efficacy from a sample of thousands, then applied findings to billions of people worldwide.

Transport

Uber's surge pricing model uses real-time statistical analysis of supply and demand patterns to set prices every few minutes across every city.

Finance

Banks use statistical models to calculate credit risk, decide loan approvals, and detect fraudulent transactions — all in milliseconds.

This unit gives you the foundation to understand, critique, and apply these techniques in your own career.

THIS WEEK

Learning Objectives

1

Define what statistics is and understand its relationship to business strategy

2

Learn key terminology: population, sample, parameter, statistic

3

Understand data collection methods and their trade-offs

4

Apply survey sampling methods: simple random, systematic, stratified, cluster

5

Classify measurement scales: nominal, ordinal, interval, ratio

6

Recognise common survey errors and how to avoid them

CONTEXT

From Strategy to Insight

Statistics sits at the heart of data-driven business

Business Strategy

Long-term goals

→

Data Collection

Gather evidence

→

Business Analytics

Process & analyse

→

Business Intelligence

Actionable insight

Data Mining

Used when data is huge (e.g. millions of supermarket transactions). Extracts patterns from big data.

Statistics

Used when data is scarce (e.g. crash test results). Makes inferences from limited samples.

DEFINITION

What is Statistics?

Statistics is the science of planning, collecting, analysing, and interpreting data to make informed decisions.

Descriptive Statistics

Collecting, summarising, and describing data. Also called Exploratory Data Analysis (EDA).

Example: "The average customer rating is 4.2 out of 5."

Inferential Statistics

Using sample data to estimate characteristics and make inferences about the whole population.

Example: "Based on our sample, we're 95% confident the true average is between 4.0–4.4."

KEY CONCEPTS

Population vs Sample

POPULATION

SAMPLE

Population

The complete set of all possible elements.

All 50,000 Qantas passengers last month

Sample

A selected portion of the population.

500 passengers surveyed

Parameter

A characteristic of the population.

True average satisfaction of all 50,000

Statistic

A characteristic of the sample (used to estimate the parameter).

Average satisfaction of the 500 surveyed

TERMINOLOGY

Sampling Building Blocks

Element

An individual object being measured.

e.g. A registered voter in Victoria

Sampling Unit

Non-overlapping collections of elements.

e.g. Households (instead of individual voters)

Frame

A list of all sampling units available to sample from.

e.g. The electoral roll

Sample

The units actually drawn from the frame.

e.g. 1,000 voters selected from the roll

The frame may be smaller than the population — some elements might not be observable. This creates coverage error, which we'll discuss later.

KNOWLEDGE CHECK #1

Quick Quiz

A company surveys 200 of its 5,000 employees about workplace satisfaction. The average satisfaction score of the 200 employees is 7.3 out of 10. What is the 7.3?

Correct! 7.3 is a statistic — it's a characteristic of the sample (200 employees), used to estimate the true population parameter (all 5,000 employees).

Not quite. 7.3 is calculated from the sample of 200 employees, making it a statistic. A parameter would describe all 5,000 employees.

DATA COLLECTION

Where does data come from?

Primary Source

You collect and analyse the data yourself. You control the process.

Running your own customer survey

Secondary Source

Someone else collected the data and made it available for your use.

Australian Bureau of Statistics data

COMPARISON

Primary Collection Methods

Method	Cost	Response Rate	Key Strength	Key Risk
Experiment	High	—	Controlled conditions, causal conclusions	Artificial setting may not reflect reality
Personal Interview	High	High	Can observe body language, clarify questions	Interviewer bias, costly training needed
Telephone Interview	Medium	Medium	Cheaper than face-to-face	Not everyone has a phone; high annoyance
Questionnaire	Low	Low	Cheap, scalable, web-friendly	Attracts extreme opinions, ambiguous Qs
Direct Observation	Medium	—	Objective, real-time data	Limited to observable behaviours
Focus Group	Medium	High	Rich qualitative insight, open-ended	Groupthink, moderator influence

You want to measure customer satisfaction at a new restaurant. Which method would you choose and why?

KNOWLEDGE CHECK #2

Quick Quiz

A web-based questionnaire about public transport satisfaction is posted on a city council's website. Which type of error is most likely to be a concern?

Correct! Self-administered questionnaires tend to attract people with strong feelings (positive or negative), creating non-response bias. Those who are neutral rarely bother to respond.

While other errors can occur, the primary concern with self-administered web questionnaires is non-response bias — people with strong opinions are far more likely to complete them.

SAMPLING

How do we select a sample?

Two broad categories of sampling

Non-Probability

Items chosen without known probability of selection. Convenient but prone to bias.

e.g. Asking your friends, street intercepts

Probability

Items chosen with known probability. Allows valid statistical inference.

We'll learn 4 key methods →

SAMPLING METHOD 1

Simple Random Sampling

Every item in the frame has an equal chance of being selected.

Highlighted = randomly selected | Items chosen using random number tables, Excel RAND(), or software

With replacement: Items returned to frame after selection (can appear twice)

Without replacement: Items removed after selection (more common in practice)

SAMPLING METHOD 2

Systematic Sampling

Pick a random start, then select every k^th item. Here k = N/n.

Start at item 3, then every 10th → items 3, 13, 23, 33, 43, 53, 63, 73, 83, 93

Example: N = 200 customers, n = 20 needed → k = 200/20 = 10. Random start at item 7, then pick items 7, 17, 27, 37…

SAMPLING METHOD 3

Stratified Sampling

Divide the frame into strata (groups with similar characteristics), then randomly sample from each.

■ High income ■ Medium income ■ Low income | Highlighted = sampled

Key idea: Ensures every subgroup is represented proportionally. If 20% of cameras sold are Pentax, then 20% of the sample should be Pentax.

SAMPLING METHOD 4

Cluster Sampling

Divide the frame into clusters (each representative of the population). Randomly select clusters, then study everyone in those clusters.

■ Selected cluster ■ Selected cluster □ Not selected

Key difference from stratified: In stratified, you sample within every stratum. In cluster, you select entire clusters and skip others.

COMPARISON

Sampling Methods at a Glance

Method	How It Works	Best When…	Watch Out For…
Simple Random	Every item has equal chance	Frame is available and not too large	Can be impractical for very large populations
Systematic	Every k^th item after random start	Frame is ordered (e.g. customer list)	Hidden periodicity in the list could cause bias
Stratified	Random sample within each subgroup	Population has distinct subgroups you want represented	Need to know the strata in advance
Cluster	Randomly select whole clusters	Population is geographically spread out	Clusters may not be truly representative

KNOWLEDGE CHECK #3

Quick Quiz

A supermarket chain wants to survey customers. They divide all stores into regions, randomly pick 5 regions, and survey every customer in those regions. What sampling method is this?

Correct! They selected entire clusters (regions) and studied everyone within them. This is cluster sampling.

Think again: they picked whole groups (regions) randomly, then surveyed everyone inside. That's cluster sampling. If they'd sampled within each region, it would be stratified.

MEASUREMENT

Types of Data

First, the big split: Categorical vs Numerical

Categorical

Responses are categories or labels.

Yes/No, Male/Female, Telstra/Optus/Vodafone

Nominal Ordinal

Numerical

Responses are numbers with quantitative meaning.

Height 1.7m, Weight 72kg, Time 5hr

Discrete Continuous

Discrete: Counting → whole numbers. Number of students in a class.

Continuous: Measuring → any value. Speed of a car, your weight.

LEVELS OF MEASUREMENT

The Measurement Ladder

Each level adds a new property

Nominal Categories only
Male / Female

Ordinal + Order
1st / 2nd / 3rd

Interval + Equal spacing
Temperature °C

Ratio + True zero
Height, Weight

Is the Richter scale (earthquake magnitude) ratio data? No! A 6 is 10× stronger than 5, but 7 is 100× (not 20×) stronger than 5. Ratios aren't proportional → it's interval data.

QUICK REFERENCE

Scales Cheat Sheet

Scale	Categories?	Order?	Equal Spacing?	True Zero?	Example
Nominal	Yes	No	No	No	Airline name, Gender, ABN
Ordinal	Yes	Yes	No	No	S&P ratings, Survey ranking, Socioeconomic class
Interval	Yes	Yes	Yes	No	Temperature (°C), Richter scale, Children's clothing size
Ratio	Yes	Yes	Yes	Yes	Height, Weight, Income, Time, ASX 200 index

Is currency discrete or continuous? Strictly discrete ($1.50 or $1.51, nothing between). But petrol at 149.9¢ shows fractions of cents — so currency is often treated as continuous in practice.

KNOWLEDGE CHECK #4

Quick Quiz

The number of tourists arriving in Australia each month is best classified as:

Correct! We're counting whole people (discrete) and there's a true zero point (ratio). You can't have 2.5 tourists arriving!

Since we're counting whole people, it's discrete. And because zero tourists is a meaningful "nothing" (true zero), it's measured on a ratio scale.

WORKED EXAMPLE

Mobile Phone Variables

Classify each variable: Categorical or Numerical? What level?

Variable	Type	Sub-type	Scale	Reasoning
Phones per household	Numerical	Discrete	Ratio	Counting whole phones; true zero
Service provider	Categorical	—	Nominal	Names of companies; no natural order
Texts sent per month	Numerical	Discrete	Ratio	Counting messages; true zero
Longest call (minutes)	Numerical	Continuous	Ratio	Measuring time; true zero
Phone colour	Categorical	—	Nominal	Colour names; no natural order
Monthly charge ($)	Numerical	Discrete	Ratio	Money to nearest cent; true zero
Owns car charge kit?	Categorical	—	Nominal	Yes/No; no order
Calls per month	Numerical	Discrete	Ratio	Counting calls; true zero
Satisfaction level	Categorical	—	Ordinal	Ordered categories (very satisfied → very dissatisfied)

WORKED EXAMPLE

Digital Camera Survey

An electronics manager surveys customers using warranty cards

Population vs Frame

Population: All customers who bought a digital camera in the past 12 months.

Frame: Customers who returned their warranty card — likely smaller and potentially biased.

Sampling Strategy

To compare brands → use stratified sampling with brands as strata. Ensure each brand's proportion in the sample matches its proportion in sales.

Categorical Questions

What is your gender? What brand did you buy? How satisfied are you? (Likert scale 1–5)

Numerical Questions

What price did you pay? How many months ago? How many times have you had it repaired?

WATCH OUT

Survey Errors

Errors can creep in at every stage of the survey process

1

Define Frame

Coverage Error

Some groups excluded from the frame → selection bias

2

Select Sample

Sampling Error

Chance differences between possible samples → margin of error

3

Collect Data

Non-Response Error

Not everyone responds → non-response bias

4

Ask Questions

Measurement Error

Ambiguous wording, halo effect, leading questions

KNOWLEDGE CHECK #5

Quick Quiz

An interviewer asks: "You like our new product, don't you?" What type of error does this introduce?

Correct! This is a leading question — it pushes the respondent toward a positive answer. It's a form of measurement error.

This phrasing pushes the respondent toward saying "yes." That's a leading question, which falls under measurement error.

BRAIN TEASER

Think About This…

Names are drawn from a fishbowl, where each name is written on a folded piece of paper. If the papers are of different sizes, is this simple random sampling?

Exactly! Simple random sampling requires equal probability for every item. Larger papers are easier to grab, so the probabilities aren't equal. This is a sneaky source of bias!

Not quite. While everyone is in the bowl, bigger papers are easier to grab — so they have a higher probability of selection. For simple random sampling, every item must have an equal chance.

SUMMARY

Week 1 Takeaways

Statistics = Evidence

It's the tool that turns limited data into actionable business intelligence. Descriptive summarises; inferential predicts.

Know Your Terms

Population → Sample. Parameter → Statistic. Frame may ≠ Population. These distinctions matter in every analysis.

Sampling Matters

Simple random, systematic, stratified, cluster — each has strengths. Choose based on your population structure and resources.

Scales Shape Analysis

Nominal → Ordinal → Interval → Ratio. The scale determines which statistical methods you can use on the data.

NEXT STEPS

After This Lecture

Read

Black et al. 2019 — Chapters 1 & 7 (sections in Study Guide)

Practice

Complete recommended problems from the Study Guide (Questions 1.8–1.10)

Excel Practice

Get comfortable with basics — try the RAND() function for random sampling

Prepare for Tutorial

Think about the discussion points — come ready to participate!

See you in the tutorial! — Stephen