Machine Learning Types: Hands-On Exercises

Supervised vs Unsupervised & Classification vs Regression

DATA4800 - Artificial Intelligence and Machine Learning

Learning Objectives

Exercise 1: Basic Identification

Instructions: For each scenario below, identify:

  1. Is this supervised or unsupervised learning?
  2. If supervised, is it classification or regression?

Scenario 1.1

A bank wants to automatically approve or reject loan applications based on applicant's credit score, income, and employment history. They have 10,000 historical applications with known outcomes.

Hint: Do we have target labels (known outcomes) from the past?

Scenario 1.2

An e-commerce company wants to understand different types of customers based on their browsing behavior, purchase patterns, and demographics. They don't know what customer groups exist.

Hint: Are we trying to discover hidden patterns without known labels?

Scenario 1.3

A real estate company wants to estimate the selling price of houses based on size, location, age, and number of bedrooms. They have sales data from the past 5 years.

Hint: Are we predicting a continuous numerical value?
Exercise 2: Making Predictions with Simple Rules

Instructions: You will use existing data to make predictions about new cases using simple rules. No complex algorithms needed - just logical thinking!

🎯 Example: How to Make Rule-Based Decisions

Sample Data - Student Grade Prediction:

Student Study Hours Attendance % Previous Test Final Grade
Alice 8 95% 85 A
Bob 3 70% 60 C
Carol 6 88% 78 B
David 2 60% 45 D

Example Rules I can create:

  • Rule 1: If Study Hours ≥ 7 AND Attendance ≥ 90% → Grade A
  • Rule 2: If Study Hours ≤ 3 AND Previous Test ≤ 60 → Grade C or D
  • Rule 3: If Study Hours = 5-6 AND Attendance ≥ 80% → Grade B

New Student to Predict: Emma: 7 Study Hours, 92% Attendance, Previous Test: 82

My Prediction: Grade A (because she meets Rule 1: Study Hours ≥ 7 AND Attendance ≥ 90%)

Dataset 2.1: Sales Performance Classification

Business Goal: Predict whether a new product will have High, Medium, or Low sales performance.

Product ID Season Marketing Spend Price Competitor Price Sales Performance
P001 Summer $5,000 $29.99 $34.99 Medium
P002 Winter $8,000 $49.99 $45.99 Low
P003 Fall $3,000 $19.99 $22.99 High
P004 Spring $6,000 $39.99 $41.99 Medium
P005 Summer $7,500 $24.99 $28.99 High
P006 Winter $2,000 $59.99 $55.99 Low

NEW PRODUCT: P007 - Season: Spring, Marketing: $4,500, Price: $34.99, Competitor Price: $38.99

Dataset 2.2: Customer Segmentation

Business Goal: Group customers into 2 types and predict which group a new customer belongs to.

Customer ID Age Income Purchases/Year Time on Site
C001 25 $45,000 12 15 min
C002 34 $75,000 8 22 min
C003 19 $25,000 3 5 min
C004 42 $95,000 15 18 min
C005 28 $55,000 6 8 min

NEW CUSTOMER: C006 - Age: 38, Income: $68,000, Purchases/Year: 11, Time on Site: 20 min

Dataset 2.3: Email Classification

Business Goal: Predict the category of a new email.

Email ID Contains "Free"? Contains "Click"? From Known Sender? Category
E001 Yes Yes No Spam
E002 No No Yes Important
E003 Yes No Yes Promotional
E004 No Yes No Spam
E005 No No Yes Important
E006 Yes Yes Yes Promotional

NEW EMAIL: E007 - Contains "Free": No, Contains "Click": Yes, From Known Sender: Yes

Exercise 3: Business Decision Framework

Decision Framework

Machine Learning Decision Framework

Scenario 3.1: Healthcare Analytics

A hospital wants to improve patient care and operational efficiency. They have patient data including demographics, medical history, treatment costs, length of stay, and patient satisfaction scores.

Multiple Questions to Explore:

  1. "What types of patients do we treat?" (Unknown patient groups)
  2. "Will this patient be readmitted within 30 days?" (Yes/No)
  3. "How much will this patient's treatment cost?" (Dollar amount)
Question 3.1a Analysis: "What types of patients do we treat?"
Question 3.1b Analysis: "Will this patient be readmitted within 30 days?"
Question 3.1c Analysis: "How much will this patient's treatment cost?"