Weight: 40% of final grade
Format: Individual forecasting project
Today's Focus: Steps 1 & 2 - Finding data and formulating your problem
Source: Online business datasets
Examples:
Good for: Operational forecasting, marketing, supply chain problems
Source: Stock markets, crypto, indices
Examples:
Good for: Investment decisions, portfolio management, risk analysis
| Source | URL | What You'll Find |
|---|---|---|
| Kaggle | kaggle.com/datasets | Retail sales, web analytics, supply chain, customer data. Search "time series" |
| UCI Repository | archive.ics.uci.edu/ml | Classic datasets: bike sharing, energy, sales, traffic |
| Google Dataset Search | datasetsearch.research.google.com | Search engine for datasets across all domains |
| data.gov.au | data.gov.au | Australian government data: tourism, energy, transport, health |
| Australian Bureau of Statistics | abs.gov.au | Retail trade, employment, building approvals, economic indicators |
| data.world | data.world | Business metrics, social data, economic data (requires free account) |
| Source | URL | What You'll Find |
|---|---|---|
| Yahoo Finance | finance.yahoo.com | Stock prices, indices, forex. Download CSV directly. Global markets. |
| ASX Data | asx.com.au | Australian stocks, indices (ASX200, All Ords). Historical prices. |
| FRED (Federal Reserve) | fred.stlouisfed.org | Economic indicators, interest rates, GDP, inflation, unemployment (US & global) |
| CoinMarketCap | coinmarketcap.com | Cryptocurrency prices, market cap, volume. Download historical data. |
| Investing.com | investing.com | Stocks, commodities, forex, indices. Historical data download available. |
| Quandl / Nasdaq Data Link | data.nasdaq.com | Financial & economic data. Free tier available (requires account). |
| Criterion | Requirement | Why It Matters |
|---|---|---|
| Time Points | ≥ 60 observations (monthly) ≥ 100 (weekly/daily) |
Need enough data to identify patterns and validate forecasts |
| Frequency | Regular intervals (daily, weekly, monthly, quarterly) | Time series methods require consistent spacing |
| Completeness | < 10% missing values | Large gaps break forecasting models |
| Recency | Includes recent data (2020+) | Old data alone (pre-2015) may not be relevant for current decisions |
| Numeric Target | At least one continuous variable to forecast | Sales, price, volume, count - needs to be measurable |
You don't just "forecast the data." You solve a business problem using forecasting.
1WHO is the stakeholder?
→ CEO? Marketing Manager? Investor? Supply Chain Director?
2WHAT decision do they need to make?
→ Set budgets? Adjust inventory? Buy/sell stock? Hire staff?
3WHY does the forecast matter?
→ What's the cost of being wrong? What's the financial impact?
Variables: Monthly revenue, customer count, marketing spend, competitor openings
"Forecast retail sales for next 12 months."
Why weak? No stakeholder, no decision, no business context, no cost structure.
Stakeholder: Regional Manager planning 2024 operations
Decision: Set monthly inventory budgets and staffing levels for Q1-Q2 2024
Cost Structure: Stockouts cost 3x more than overstocking (lost sales vs. holding costs)
Business Impact: Each 10% forecast error costs ~$50k in Q1 through inefficient inventory
Deliverable: Monthly sales forecast + recommended inventory levels + staffing plan with dollar impacts
Variables: Daily close price, volume, ASX200 index, interest rates
"Predict CBA stock price using ARIMA."
Why weak? No stakeholder, no investment decision, no risk analysis, prescribes method before analysis.
Stakeholder: Retail investor with $100k to invest
Decision: Buy, hold, or sell CBA stock for a 6-month horizon (Q1-Q2 2024)
Risk Tolerance: Moderate (willing to accept 10% downside for 15% upside potential)
Alternative: Compare to ASX200 index fund (benchmark return)
Deliverable: 6-month price forecast + buy/hold/sell recommendation + expected return vs. benchmark + risk assessment
Variables: Weekly visitors, conversion rate, marketing spend, sales revenue, seasonality
Stakeholder: Marketing Director with $500k annual budget
Decision: Optimize Q1 2024 marketing budget allocation across channels
Business Question: "What's the ROI of marketing spend? How much should we invest in Q1?"
Approach: Model visitors → sales relationship, test Granger causality, calculate $ impact per $1k spend
Deliverable: Q1 visitor forecast + sales forecast + recommended marketing spend ($X) + expected ROI (Y:1)
Before finalizing your dataset and problem, check these boxes:
☐ Dataset meets requirements: ≥60 observations, regular frequency, <10% missing
☐ Clear stakeholder identified: Who needs this forecast? (title/role)
☐ Specific decision defined: What action will they take with the forecast?
☐ Cost structure understood: What errors cost more? Overestimate or underestimate?
☐ Business impact quantifiable: Can you express impact in dollars?
☐ Timeline specified: Forecast for how many periods ahead? (Q1 2024? Next 6 months?)
☐ Success criteria clear: What defines a "good" forecast? (Not just MAE!)
☐ Data patterns identifiable: Can you see trend/seasonality/relationships to analyze?
| Mistake | How to Fix It |
|---|---|
| Choosing dataset first, problem second | "I found cool data" → BAD. Start with "What business problem interests me?" then find data. |
| Generic problem statements | Don't say "forecast sales." Say "help Regional Manager set Q1 inventory budgets to minimize stockout costs." |
| No cost structure | Every business has asymmetric costs. Being wrong in one direction hurts more than the other. Identify this. |
| Focusing only on accuracy | A3 requires dollar-based ROI. "RMSE = 5.2" is not a business recommendation. "$50k potential savings" is. |
| Ignoring data quality issues | If data has 40% missing values or stops in 2018, find better data. Don't try to force it. |
| Too broad or too narrow | Too broad: "Forecast economy." Too narrow: "Forecast sales on Tuesdays in March." Find middle ground. |
| Prescribing method before analysis | Don't say "I'll use ARIMA." Analyze first, then match method to pattern + business need. |
Use this template to draft your A3 problem statement:
[Stakeholder role] at [Company/Organization] needs to [specific decision] for [time period]. Currently, [describe current situation/problem]. Being wrong costs approximately [$X] because [explain cost structure]. A reliable forecast would enable [specific action/benefit] with an estimated impact of [$Y].
The Regional Manager at CoffeeCo (15 locations) needs to set monthly inventory budgets and staffing levels for Q1-Q2 2024. Currently, budgets are based on simple year-over-year growth (+5%), missing seasonal patterns and COVID impacts. Being wrong costs approximately $50k per 10% error because stockouts cost 3x more than overstocking (lost sales vs. holding costs). A reliable forecast would enable optimized inventory purchasing and labor scheduling with an estimated impact of $150k savings in Q1-Q2.
"retail time series"
"sales forecasting"
"demand prediction"
"website traffic"
"supply chain"
"energy consumption"
ASX: CBA.AX, BHP.AX, WES.AX
Crypto: BTC, ETH, BNB
Indices: ^AXJO (ASX200), ^GSPC (S&P500)
Forex: AUDUSD, EURUSD
Need help? Bring your draft dataset + problem to your facilitator in Week 12 for feedback before committing!
Questions?