DATA4400
Data-Driven Forecasting

Week 2: Generative AI in Forecasting and Moving Averages

Learning Outcomes:

  • Understand the role of Generative AI in forecasting
  • Evaluate prompt engineering techniques for forecasting
  • Apply moving averages to smooth time series data
  • Implement forecasting methods in Python

Course Roadmap: Where We Are

Week Topic Status
1 Forecasting Foundations and Data Preparation ✓ Complete
2 Generative AI in Forecasting and Moving Averages ← Today
3 Stationarity, Correlation Analysis, Time Series Properties Upcoming
4 Introductory Forecasting Methods Upcoming

The AI Revolution in Forecasting

Traditional Approach

  • Manual statistical analysis
  • Limited pattern recognition
  • Time-intensive processes
  • Difficulty handling complexity
  • Reactive to changes

AI-Enhanced Approach

  • Automated pattern detection
  • Advanced temporal analysis
  • Rapid processing at scale
  • Complex interdependencies
  • Adaptive learning

Key Insight: Generative AI transforms forecasting from a purely statistical exercise into an intelligent, interpretive process that enhances human decision-making.

Large Language Models (LLMs)

LLMs are advanced AI systems that understand, generate, and interact with human language at scale using deep learning and vast datasets.

Key Features

Transformer Architecture

Efficiently processes text sequences and captures complex patterns in data relationships.

Massive Training Scale

Billions or trillions of parameters enable learning of grammar, semantics, and factual knowledge.

Model Developer Primary Use in Forecasting
ChatGPT OpenAI Data generation, scenario analysis
Claude Anthropic Analytical reasoning, code assistance
Gemini Google Multi-modal data integration

How LLMs Support Forecasting

1. Pattern Recognition

LLMs identify complex temporal patterns and anomalies in high-dimensional datasets that traditional statistical methods may overlook.

2. Natural Language Insights

Transform numerical forecasts into meaningful business narratives, explaining trends, seasonality, and confidence intervals in clear language.

3. Multi-Modal Integration

Combine time series data with external text sources, news sentiment, and market indicators for comprehensive forecasting models.

Quiz 1: Understanding LLMs

Which of the following is NOT a primary advantage of using LLMs in forecasting?

A. Automated pattern detection in complex data
B. Translation of numerical results into business language
C. Guaranteed 100% accurate predictions
D. Integration of multiple data sources

Key Roles of Generative AI in Forecasting

Refines Predictive Models

Improves accuracy and adaptability through intelligent model optimization and continuous learning algorithms.

Automates Data Analysis

Handles vast datasets efficiently with intelligent processing pipelines that scale seamlessly with data volume.

Simulates Complex Scenarios

Enables advanced "what-if" exploration through sophisticated scenario modeling and simulation capabilities.

Generates Synthetic Data

Creates high-quality synthetic data that preserves statistical properties while ensuring privacy and compliance.

Synthetic Data Generation

Pattern Analysis

GenAI identifies statistical properties and temporal patterns in existing datasets

Synthetic Generation

Creates realistic time series data maintaining original characteristics

Validation

Tests model robustness across diverse synthetic scenarios

Why Synthetic Data Matters

Challenge Solution
Data Scarcity Fills gaps when historical data is limited or unavailable
Privacy Concerns Protects sensitive information while enabling model training
Development Costs Faster and less expensive than collecting real-world data

Benefits of Synthetic Data

Key Takeaway: Synthetic data generation enables organizations to develop forecasting models 3× faster while maintaining 100% privacy compliance and achieving 50% cost reduction.

Scenario Simulation with GenAI

Scenario simulation transforms forecasting into a strategic tool for preparedness and resilience by exploring multiple potential futures.

Why Scenario Simulation Matters

  • Beyond Single Predictions: Explore best-case, worst-case, and average-case outcomes
  • Risk Management: Identify vulnerabilities and opportunities before they occur
  • Resilient Strategies: Develop flexible plans for uncertain conditions

Real-World Example: Insurance Industry

Model synthetic catastrophic events (e.g., Category 5 hurricane). GenAI can:

  • Forecast claim surges across policy types
  • Estimate payout volumes and operational bottlenecks
  • Refine risk exposure models without waiting for actual disasters

Quiz 2: Synthetic Data Applications

A retail company wants to forecast demand for a new product with no historical sales data. What is the BEST use of synthetic data in this scenario?

A. Replace all real customer data permanently
B. Generate scenarios based on similar products to test forecasting models
C. Use it as the final forecast without validation
D. Share it publicly to get customer feedback

Prompt Engineering for Forecasting

"The quality of your AI output is directly proportional to the specificity and context of your prompts"

Three Core Principles

1. Context Setting

Provide comprehensive background about your data, industry, and forecasting objectives to guide AI reasoning.

Example: "Forecast demand for electric vehicles in Australia over the next 5 years"

2. Constraint Definition

Specify statistical requirements, confidence levels, and business constraints to ensure practical outputs.

Example: "Assume oil prices remain above USD $100/barrel and government subsidies continue"

3. Output Formatting

Request structured responses with clear explanations, uncertainty quantification, and actionable insights.

Example: "Present results in a table with columns for scenario, assumptions, and business impact"

Prompt Engineering: Best Practices

Principle Poor Example Good Example
Define Context "Forecast demand" "Forecast quarterly demand for smartphones in Southeast Asia, 2025-2027"
Specify Variables "Include important factors" "Include GDP growth, competitor pricing, and seasonal trends"
Request Scenarios "Give me a forecast" "Provide best-case (20%), baseline (50%), and worst-case (20%) scenarios"
Structure Output "Tell me the results" "Create a table with columns: Scenario, Key Assumptions, Forecast Range, Confidence Level"

Practical Prompt Engineering Example

❌ Ineffective Prompt

"Generate customer data"

Problems:

  • No specifications
  • No variables defined
  • No business context
  • Unpredictable output
VS

✓ Effective Prompt

"Generate 200 rows of synthetic retail customer data with variables: Customer_ID, Age (18-65), Location (Australian cities), Average_Spend, Preferred_Channel. Ensure customers under 25 spend less than $200 on average, and 30% prefer online shopping."

Strengths:

  • Clear row count
  • Defined variables with ranges
  • Business logic specified
  • Realistic constraints

Activity: Prompt Engineering for Forecasting

Your Task: Create a Prompt for Synthetic Sales Data

Scenario: You are forecasting weekly sales for a coffee shop chain with 15 locations across Brisbane.

Requirements:

  • 2 years of weekly data (104 weeks)
  • Variables: Date, Location, Units_Sold, Revenue, Weather_Condition
  • Seasonal patterns: Higher sales in winter months (June-August)
  • Weekend sales typically 40% higher than weekdays
  • Weather impact: Rainy days show 15% increase in sales

Prompt Template to Complete:

"Generate ____ rows of synthetic weekly sales data for ____. Include variables: ____. Apply the following business rules: ____. Ensure seasonal patterns: ____."

Discussion: How would you validate that the synthetic data is realistic?

Quiz 3: Prompt Engineering

You want to create synthetic data for forecasting hospital bed occupancy. Which prompt element is MOST critical for realistic data generation?

A. Requesting a large number of rows (10,000+)
B. Specifying seasonal patterns and weekly cycles in patient admissions
C. Using technical medical terminology
D. Asking for data in alphabetical order

Ethical Considerations and Limitations

Bias and Fairness

AI models may perpetuate historical biases present in training data. Regular auditing and diverse validation datasets are essential for equitable forecasting.

Transparency Requirements

Maintain clear audit trails and provide interpretable explanations for business stakeholders. Avoid "black box" decision-making.

Data Privacy

Ensure sensitive time series data is protected when using cloud-based AI services. Consider on-premises solutions for highly confidential forecasting projects.

Key Limitations

  • Context Understanding: AI may miss nuanced business context that humans easily recognize
  • Hallucination Risk: Models may generate plausible but incorrect insights
  • Data Quality Dependency: AI amplifies data quality issues rather than fixing them
  • Domain Specificity: Generic models may not capture industry-specific patterns

Appropriate Use of GenAI in Academic Work

Appropriate Use ✓ Inappropriate Use ✗
Brainstorming ideas and draft structures Submitting AI-generated text as original work
Summarizing papers to aid comprehension Using AI to avoid reading assigned texts
Refining or proofreading your own writing Having AI complete entire assessments
Generating synthetic data for testing (if permitted) Misrepresenting AI data as empirical research
Debugging code with AI assistance Copying AI-generated code without understanding

Always cite GenAI use: Include prompts and responses in appendices, and acknowledge AI assistance in your methodology section.

GenAI in Forecasting: Key Takeaways

Start Small, Scale Smart

Begin with AI-assisted interpretation and code generation before implementing fully automated workflows.

Maintain Human Oversight

AI enhances but never replaces domain expertise and critical thinking in forecasting decisions.

Continuous Learning

Stay current with emerging AI forecasting tools and techniques through hands-on experimentation.

Transition to Moving Averages

Now that you understand how GenAI can support forecasting, we will apply these concepts to a fundamental forecasting technique: Moving Averages. You will use both traditional methods AND GenAI to deepen your understanding.

Smoothing Techniques in Time Series

Smoothing techniques reduce random variation or noise in data to reveal underlying trends and patterns.

Purpose

  • Reduce random noise in time series data
  • Reveal underlying trends and patterns
  • Enable more accurate forecasting

Key Concept

Smoothing filters out short-term fluctuations

Highlights long-term signals

Common Applications

  • Sales forecasting
  • Stock price trend analysis
  • Economic indicator monitoring
  • Quality control in manufacturing

Visualizing the Effect of Smoothing

3-Point Moving Average

Moderate smoothing, retains more detail, responds faster to changes

5-Point Moving Average

Greater smoothing, clearer trend, slower response to changes

Simple Moving Average: The Formula

A k-period moving average uses the arithmetic mean of the k most recent time periods to forecast the next value.

Ŷt+1 = (Yt + Yt-1 + Yt-2 + ... + Yt-k+1) / k

Understanding the Components

Symbol Meaning Example
Ŷt+1 Forecast for next period Sales forecast for Week 6
Yt Actual value at current period Actual sales in Week 5
k Number of periods to average 3 weeks, 4 quarters, etc.
t Current time period Week 5, Quarter 3, etc.

Simple Moving Average: Worked Example

Scenario: Coffee shop daily cup sales (3-day moving average)

Day Actual Sales 3-Day Moving Average Calculation
Monday 120 - Not enough data
Tuesday 115 - Not enough data
Wednesday 118 - Not enough data
Thursday 125 117.67 (120 + 115 + 118) / 3
Friday 130 119.33 (115 + 118 + 125) / 3
Saturday - 124.33 (118 + 125 + 130) / 3 ← Forecast

Interpretation: Based on the last 3 days (Wed-Fri), we forecast Saturday sales of approximately 124 cups.

Quiz 4: Simple Moving Average Calculation

A store's daily sales for the past 4 days were: 50, 60, 55, 65 units. Using a 3-day moving average, what is the forecast for day 5?

A. 57.5 units
B. 60 units
C. 55 units
D. 65 units

Solution: (60 + 55 + 65) / 3 = 180 / 3 = 60 units

Choosing the Right k Value

The choice of k (number of periods) significantly impacts forecast performance:

Smaller k (e.g., k=3)

Advantages:

  • Responds quickly to changes
  • Captures recent trends
  • Good for volatile data

Disadvantages:

  • More sensitive to noise
  • Less smooth curve
  • May overreact to outliers

Larger k (e.g., k=12)

Advantages:

  • Smoother trend line
  • Reduces noise impact
  • Stable forecasts

Disadvantages:

  • Slow to respond to changes
  • May miss recent trends
  • Lags behind actual shifts

Practical Guideline: For seasonal data, set k equal to the season length (e.g., k=4 for quarterly data, k=12 for monthly data with yearly seasonality).

Centered Moving Average

When k is even (e.g., k=4 for quarterly data), the standard moving average falls between time periods. A centered moving average aligns the average with actual time periods.

Why Center Moving Averages?

  • Synchronization: Aligns smoothed values with original time series
  • Seasonal Analysis: Essential for identifying seasonal patterns accurately
  • Trend Estimation: Provides better trend estimates at specific time points

The Challenge with Even k

A 4-period moving average ending at period t is centered at time (t - 1.5), which is between periods. To center it at period (t - 2), we take the average of two consecutive 4-period moving averages.

Centered Moving Average: Formula

For even k, the centered moving average at time t - k/2 is a weighted average:

CMAt-k/2 = (0.5Yt + Yt-1 + Yt-2 + ... + Yt-k+1 + 0.5Yt-k) / k

This is equivalent to averaging two consecutive k-period moving averages.

Key Features

  • End values weighted at 0.5: First and last observations in the window receive half weight
  • Middle values weighted at 1.0: All other observations receive full weight
  • Synchronized timing: Result is centered exactly at time t - k/2

Note: Centered moving averages cannot be used for forecasting future values, as they require future data. They are used for historical trend analysis and seasonal decomposition.

Centered Moving Average: Worked Example

Quarterly Sales Data ($ millions):

Quarter Sales 4-Period MA Centered MA Calculation
Q1 4.71 - - Not enough data
Q2 4.75 - - Not enough data
Q3 4.63 - 4.71 (0.5×4.71 + 4.75 + 4.63 + 4.74 + 0.5×4.19) / 4
Q4 4.74 4.71 - (4.71 + 4.75 + 4.63 + 4.74) / 4
Q5 4.19 4.58 - (4.75 + 4.63 + 4.74 + 4.19) / 4

Interpretation: The centered moving average of 4.71 at Q3 represents the smoothed trend level for that quarter, removing seasonal fluctuations.

Practice Exercise: iPhone Demand Forecast

Task: Calculate a 2-Month Moving Average

Monthly iPhone demand (millions of units):

Month 1 2 3 4 5 6
Demand 13 17 19 23 24 ?
2-Month MA - - ? ? ? ?

Questions:

  1. Complete the 2-month moving average column
  2. What is your forecast for month 6?
  3. Calculate the forecast error for months 3-5 (actual - forecast)
  4. Would a 3-month moving average be better? Why or why not?

Hint: Start by calculating MA for month 3: (13 + 17) / 2 = 15

Quiz 5: Moving Average Applications

Which statement about moving averages is TRUE?

A. Larger k values respond more quickly to recent changes in the data
B. Moving averages can predict values far into the future with high accuracy
C. For seasonal data, k should typically equal the number of seasons in a cycle
D. Moving averages work best when data has a strong upward or downward trend

Comparing Different k Values

Observation: Notice how the 7-period moving average (blue) is much smoother but lags behind the original data more than the 3-period average (orange). The choice of k depends on your forecasting objectives and data characteristics.

Implementing Moving Averages in Python

In the practical session, you will implement moving averages using Python and pandas:

import pandas as pd import matplotlib.pyplot as plt # Load data data = pd.read_excel('sales_data.xlsx') # Calculate 3-period simple moving average data['MA_3'] = data['Sales'].rolling(window=3).mean() # Calculate 7-period simple moving average data['MA_7'] = data['Sales'].rolling(window=7).mean() # Visualize plt.figure(figsize=(12, 6)) plt.plot(data['Date'], data['Sales'], label='Actual Sales') plt.plot(data['Date'], data['MA_3'], label='3-Period MA') plt.plot(data['Date'], data['MA_7'], label='7-Period MA') plt.legend() plt.show()

You Can Also Use GenAI!

Try asking ChatGPT or Claude: "Write Python code to calculate a 4-period moving average on a DataFrame column called 'Revenue' and explain how it works."

Limitations of Moving Averages

1. Requires Stable Data

Moving averages work best when there is no strong trend. With a clear upward or downward trend, forecasts will systematically lag behind actual values.

2. Loss of Data Points

The first (k-1) periods cannot have a moving average calculated. For small datasets, this can be a significant limitation.

3. Equal Weighting

Simple moving averages treat all observations equally. Recent data may be more relevant than older data, but this method doesn't account for that.

4. Not Suitable for Long-Term Forecasting

Moving averages provide only one-step-ahead forecasts. They cannot project multiple periods into the future.

Next Steps: More sophisticated methods like Exponential Smoothing (Week 4) and ARIMA (Week 7) address these limitations.

Week 2 Summary

Part 1: Generative AI in Forecasting

  • LLMs enhance forecasting through pattern recognition and natural language insights
  • Synthetic data enables testing, privacy protection, and scenario simulation
  • Effective prompt engineering is critical for quality AI outputs
  • Always maintain human oversight and ethical considerations

Part 2: Moving Averages

  • Smoothing techniques reduce noise to reveal underlying trends
  • Simple moving averages use arithmetic mean of k recent periods
  • Centered moving averages align smoothed values with time periods
  • Choice of k balances responsiveness vs. smoothness

Next Week: Stationarity and Correlation Analysis

We will explore time series properties essential for advanced forecasting methods, including stationarity tests, autocorrelation, and data transformations.

Action Items

  • Complete Python Activity: Moving Averages (Jupyter Notebook)
  • Practice prompt engineering with GenAI tools
  • Review your understanding of smoothing concepts
  • Prepare questions for next week