Term 2, 2025
Characterise machine learning with a focus on supervised learning approaches
Explore and compare fundamental regression algorithms for prediction tasks
Review classification algorithms including K-Nearest Neighbors and Decision Trees
Machine Learning is a subset of artificial intelligence that enables computer systems to learn from data and improve performance on specific tasks without being explicitly programmed for every scenario.
Consider autonomous vehicles as an example of machine learning in action:
More sensors (cameras, LIDAR, radar) provide richer feedback. Higher quality data leads to better decision-making.
Pre-existing knowledge about road rules, object recognition, and regional differences (e.g., kangaroos in Australia) improves safety.
The choice of algorithm, feature engineering, and model architecture significantly impact performance.
Real-time learning from new situations and immediate feedback on decisions enables continuous improvement.
| Industry | Application Examples | ML Task Type |
|---|---|---|
| Retail | Product recommendations, demand forecasting, price optimization | Classification, Regression |
| Healthcare | Disease diagnosis, patient risk assessment, drug discovery | Classification, Clustering |
| Finance | Fraud detection, credit scoring, algorithmic trading | Classification, Regression |
| Digital Marketing | Customer segmentation, ad targeting, churn prediction | Clustering, Classification |
Supervised learning uses labelled training data to learn patterns and relationships between input features and output targets. The algorithm learns from examples where the correct answer is known.
Gather historical data where both inputs (features) and outputs (labels) are known. Example: Customer age, income, purchase history → Did they buy? (Yes/No)
The algorithm learns patterns by finding relationships between features and labels in the training data.
Apply the trained model to new, unseen data to predict outcomes based on learned patterns.
Classification is a supervised learning task where the goal is to predict categorical outcomes (discrete classes).
Key Principle: We split data to evaluate how well our model generalizes to new, unseen data. Training on all data and testing on the same data would give misleadingly high accuracy (overfitting).
Core Idea: Similar examples should have similar labels. Classify new data points based on their similarity to training examples.
Regression is used when we want to predict continuous numerical values rather than categories.
| Regression Type | Use Case | Output Range |
|---|---|---|
| Linear Regression | Predicting house prices, sales revenue, temperature | Continuous (-∞ to +∞) |
| Logistic Regression | Binary outcomes: customer churn, disease diagnosis | Probability (0 to 1) |
| Polynomial Regression | Non-linear relationships, curved patterns | Continuous (-∞ to +∞) |
Use Cases:
Output: Continuous → Continuous
Example: One unit increase in marketing spend increases sales by $50
Use Cases:
Output: Continuous → Binary (0 or 1)
Example: Probability customer will purchase = 0.73
The logistic function transforms any input value into a probability between 0 and 1, creating an S-shaped curve.
Use the slider below to adjust the steepness parameter and observe how the curve changes!
Intercept (b₀): Shifts the curve left or right. Slope (b₁): Controls how steep the transition is between 0 and 1.
KNN is a simple yet powerful classification algorithm that assigns labels based on the majority class among the k closest training examples.
Click anywhere in the plot area to classify a new point!
Using multiple neighbors reduces the impact of noise and outliers in the data.
Adjust k using the slider to see how it affects the classification!
Scenario: Restaurant recommendation system predicting tip amount (Small/Large) based on food quality and service speed.
| Food Quality | Service Speed | Tip Size |
|---|---|---|
| Great | Fast | Large |
| Great | Fast | Large |
| Mediocre | Fast | Small |
| Great | Slow | Large |
New Customer: Food Quality = Great, Service Speed = Fast
Prediction (k=2): Large Tip (matches 2 nearest neighbors)
Overfits to noise and outliers. High variance, low bias. Sensitive to individual data points.
Underfits the data. Low variance, high bias. May include irrelevant distant points.
Optimal K: Usually found through cross-validation (often √n where n = number of samples)
Best Practice: Start with KNN as a baseline model before trying more complex algorithms.
A decision tree is a flowchart-like structure that makes decisions by asking a series of questions about the features, splitting the data at each node until reaching a final prediction.
Scenario: You must choose between opening a fast-food outlet or a bookshop. Each has different success rates and financial outcomes.
• 50% chance of success: +$1,000 per week
• 50% chance of failure: -$300 per week
• 50% chance of success: +$900 per week
• 50% chance of failure: -$100 per week
Decision: Choose Bookshop (Higher EMV)
In ML, decision trees automatically learn the best questions to ask from data, creating classification or regression rules.
Algorithm: At each node, the tree selects the feature and split point that best separates the classes (maximizes information gain or minimizes impurity).
Easy to understand and explain to non-technical stakeholders. You can visualize the exact decision path.
Works with both numerical and categorical features without requiring encoding.
No need for feature scaling or normalization. Robust to outliers.
Can model complex interactions between features without manual feature engineering.
We learned about supervised, unsupervised, and reinforcement learning. Supervised learning uses labeled data to learn patterns and make predictions.
Linear regression predicts continuous values, while logistic regression predicts probabilities for binary outcomes. Both are foundational ML algorithms.
K-Nearest Neighbors classifies based on similarity to neighbors. Decision trees create interpretable rules through sequential decisions.
Next Week: Decision Trees, Random Forests, and Ensemble Methods