Slide 1 of 34

DATA4800

Artificial Intelligence and Machine Learning

Workshop 2

Introduction to Machine Learning

Term 2, 2025

Learning Outcomes

1

Characterise machine learning with a focus on supervised learning approaches

2

Explore and compare fundamental regression algorithms for prediction tasks

3

Review classification algorithms including K-Nearest Neighbors and Decision Trees

DATA4800 Course Roadmap

What is Machine Learning?

Machine Learning is a subset of artificial intelligence that enables computer systems to learn from data and improve performance on specific tasks without being explicitly programmed for every scenario.

Key Characteristics:

  • Learns patterns from historical data
  • Makes predictions on new, unseen data
  • Improves accuracy with more data and feedback
  • Automates decision-making processes

Types of Machine Learning

Quiz 1: Understanding ML Types

A bank wants to predict whether a customer will default on a loan based on historical data of past customers (with known outcomes). Which type of machine learning is most appropriate?

A. Unsupervised Learning
B. Supervised Learning
C. Reinforcement Learning
D. Semi-supervised Learning

Factors Affecting ML Performance

Consider autonomous vehicles as an example of machine learning in action:

1. Data Quality & Quantity

More sensors (cameras, LIDAR, radar) provide richer feedback. Higher quality data leads to better decision-making.

2. Prior Knowledge

Pre-existing knowledge about road rules, object recognition, and regional differences (e.g., kangaroos in Australia) improves safety.

3. Learning Algorithms

The choice of algorithm, feature engineering, and model architecture significantly impact performance.

4. Adaptability & Feedback

Real-time learning from new situations and immediate feedback on decisions enables continuous improvement.

Machine Learning Applications

Industry Application Examples ML Task Type
Retail Product recommendations, demand forecasting, price optimization Classification, Regression
Healthcare Disease diagnosis, patient risk assessment, drug discovery Classification, Clustering
Finance Fraud detection, credit scoring, algorithmic trading Classification, Regression
Digital Marketing Customer segmentation, ad targeting, churn prediction Clustering, Classification

What is Supervised Learning?

Supervised learning uses labelled training data to learn patterns and relationships between input features and output targets. The algorithm learns from examples where the correct answer is known.

Supervised Learning Process

Step 1: Collect Labelled Data

Gather historical data where both inputs (features) and outputs (labels) are known. Example: Customer age, income, purchase history → Did they buy? (Yes/No)

Step 2: Train the Model

The algorithm learns patterns by finding relationships between features and labels in the training data.

Step 3: Make Predictions

Apply the trained model to new, unseen data to predict outcomes based on learned patterns.

Supervised Learning: Classification

Classification is a supervised learning task where the goal is to predict categorical outcomes (discrete classes).

Binary Classification

  • Email: Spam vs. Not Spam
  • Medical: Disease vs. Healthy
  • Finance: Fraud vs. Legitimate
  • Customer: Churn vs. Retain

Multi-class Classification

  • Flower species (Iris, Rose, Tulip)
  • Customer segments (High, Medium, Low value)
  • Product categories
  • Sentiment (Positive, Neutral, Negative)

Why Split Data into Training and Testing Sets?

Key Principle: We split data to evaluate how well our model generalizes to new, unseen data. Training on all data and testing on the same data would give misleadingly high accuracy (overfitting).

Quiz 2: Training and Testing

You train a model on 100% of your data and test it on the same data, achieving 99% accuracy. You then deploy it to production and find it performs poorly. What is the most likely explanation?

A. The model needs more training epochs
B. The model has overfit to the training data
C. The 70/30 split was incorrect
D. The model requires more features

Instance-Based Learning Concept

Core Idea: Similar examples should have similar labels. Classify new data points based on their similarity to training examples.

Key Questions:

  • How do we measure similarity? Distance metrics (Euclidean, Manhattan, etc.)
  • How many neighbors to consider? This is the parameter 'k' in K-Nearest Neighbors
  • How to resolve conflicts? Majority voting among neighbors

Understanding Regression

Regression is used when we want to predict continuous numerical values rather than categories.

Regression Type Use Case Output Range
Linear Regression Predicting house prices, sales revenue, temperature Continuous (-∞ to +∞)
Logistic Regression Binary outcomes: customer churn, disease diagnosis Probability (0 to 1)
Polynomial Regression Non-linear relationships, curved patterns Continuous (-∞ to +∞)

Linear vs. Logistic Regression

Three Regression Approaches Compared

Linear Regression

Use Cases:

  • Econometric modeling
  • Marketing mix models
  • Customer lifetime value

Output: Continuous → Continuous

Example: One unit increase in marketing spend increases sales by $50

Logistic Regression

Use Cases:

  • Customer choice models
  • Click-through rate prediction
  • Credit scoring

Output: Continuous → Binary (0 or 1)

Example: Probability customer will purchase = 0.73

The Logistic Function

The logistic function transforms any input value into a probability between 0 and 1, creating an S-shaped curve.

Use the slider below to adjust the steepness parameter and observe how the curve changes!

How Parameters Affect Predictions

Intercept (b₀): Shifts the curve left or right. Slope (b₁): Controls how steep the transition is between 0 and 1.

Quiz 3: Regression Types

A retail company wants to predict the exact dollar amount each customer will spend next month based on their browsing history and past purchases. Which algorithm is most appropriate?

A. Linear Regression
B. Logistic Regression
C. K-Nearest Neighbors Classification
D. Decision Tree Classification

K-Nearest Neighbors (KNN) Algorithm

KNN is a simple yet powerful classification algorithm that assigns labels based on the majority class among the k closest training examples.

Core Principles:

  • No explicit training phase - stores all training data
  • Makes predictions by finding similar examples
  • Simple to understand and implement
  • Works well for many real-world problems

1-Nearest Neighbor (Simplest Case)

Click anywhere in the plot area to classify a new point!

From 1-NN to K-NN

Using multiple neighbors reduces the impact of noise and outliers in the data.

Adjust k using the slider to see how it affects the classification!

KNN Step-by-Step Example

Scenario: Restaurant recommendation system predicting tip amount (Small/Large) based on food quality and service speed.

Training Data:

Food Quality Service Speed Tip Size
GreatFastLarge
GreatFastLarge
MediocreFastSmall
GreatSlowLarge

New Customer: Food Quality = Great, Service Speed = Fast

Prediction (k=2): Large Tip (matches 2 nearest neighbors)

Selecting the Right Value of K

K Too Small (k=1)

Overfits to noise and outliers. High variance, low bias. Sensitive to individual data points.

K Too Large

Underfits the data. Low variance, high bias. May include irrelevant distant points.

Optimal K: Usually found through cross-validation (often √n where n = number of samples)

KNN: Strengths and Limitations

Advantages

  • Simple to understand and implement
  • No training phase (stores data directly)
  • Naturally handles multi-class problems
  • Effective for many practical applications
  • Can capture complex decision boundaries

Disadvantages

  • Slow predictions on large datasets
  • Sensitive to irrelevant features
  • Requires feature scaling/normalization
  • Struggles with high-dimensional data
  • Memory intensive (stores all training data)

Best Practice: Start with KNN as a baseline model before trying more complex algorithms.

Quiz 4: K-Nearest Neighbors

You notice your KNN model performs perfectly on training data but poorly on test data. What is the most likely cause and solution?

A. K is too large; decrease k
B. K is too small (possibly k=1); increase k
C. Need more training data
D. Features need to be removed

Introduction to Decision Trees

A decision tree is a flowchart-like structure that makes decisions by asking a series of questions about the features, splitting the data at each node until reaching a final prediction.

Key Components:

  • Root Node: The first decision point (top of tree)
  • Internal Nodes: Decision points based on feature values
  • Branches: Outcomes of decisions (Yes/No, High/Low)
  • Leaf Nodes: Final predictions or classifications

Decision Trees for Business Decisions

Scenario: You must choose between opening a fast-food outlet or a bookshop. Each has different success rates and financial outcomes.

Understanding Expected Monetary Value (EMV)

Fast Food Outlet:

• 50% chance of success: +$1,000 per week

• 50% chance of failure: -$300 per week

EMV = (0.5 × $1,000) + (0.5 × -$300) = $500 - $150 = $350

Bookshop:

• 50% chance of success: +$900 per week

• 50% chance of failure: -$100 per week

EMV = (0.5 × $900) + (0.5 × -$100) = $450 - $50 = $400

Decision: Choose Bookshop (Higher EMV)

Decision Trees in Machine Learning

In ML, decision trees automatically learn the best questions to ask from data, creating classification or regression rules.

Algorithm: At each node, the tree selects the feature and split point that best separates the classes (maximizes information gain or minimizes impurity).

Decision Tree Advantages

Highly Interpretable

Easy to understand and explain to non-technical stakeholders. You can visualize the exact decision path.

Handles Mixed Data Types

Works with both numerical and categorical features without requiring encoding.

Requires Minimal Data Preprocessing

No need for feature scaling or normalization. Robust to outliers.

Captures Non-linear Relationships

Can model complex interactions between features without manual feature engineering.

Quiz 5: Decision Trees

A decision tree achieves 100% accuracy on training data but only 65% on test data. Which technique would most likely improve test performance?

A. Increase tree depth to capture more patterns
B. Prune the tree or limit maximum depth
C. Add more features to the model
D. Use more training data without other changes

Week 2 Summary

Machine Learning Fundamentals

We learned about supervised, unsupervised, and reinforcement learning. Supervised learning uses labeled data to learn patterns and make predictions.

Regression Techniques

Linear regression predicts continuous values, while logistic regression predicts probabilities for binary outcomes. Both are foundational ML algorithms.

Classification Algorithms

K-Nearest Neighbors classifies based on similarity to neighbors. Decision trees create interpretable rules through sequential decisions.

Next Week: Decision Trees, Random Forests, and Ensemble Methods