Week 2: Deep Learning Fundamentals

CP3501 - Deep Learning

Building Your First Image Classifier

Today's Goals

  • Understand machine learning types and tasks
  • Learn how deep learning processes data
  • Build a bear classifier using FastAI
  • Evaluate and improve the model
Important: We will learn CONCEPTS first with sample data, then see the code. Don't worry if you're new to this - we'll build from zero.

What is Machine Learning?

Traditional Programming

We write explicit rules:

  • If it has round ears → teddy bear
  • If it has a hump → grizzly
  • If it's all black → black bear

Problem: Rules break easily. What if lighting changes? What if bear is partially hidden?

Machine Learning

We show examples:

  • Here are 100 teddy bear photos
  • Here are 100 grizzly photos
  • Here are 100 black bear photos

Benefit: Computer learns patterns automatically, handles variations better.

Machine Learning = Teaching by examples, not by writing rules

Two Main Types of Machine Learning

Supervised Learning

We provide labeled examples

Like a teacher giving answers

Example:

Image + "This is a grizzly bear"

Image + "This is a black bear"

Result: Model learns to predict labels for new data

Unsupervised Learning

We provide unlabeled examples

Like letting students discover patterns

Example:

Many images (no labels)

Model finds groups/patterns

Result: Model discovers structure in data

This course focuses on supervised learning. Our bear classifier uses labeled images.

Two Types of Supervised Learning

Classification

Predict a category

Output: One of several discrete options

Examples:

  • Is this email spam or not spam?
  • Which bear type is this?
  • Is this tumor benign or malignant?
  • What digit is in this image?

Regression

Predict a number

Output: A continuous value

Examples:

  • What will house price be?
  • How old is this person?
  • What temperature tomorrow?
  • How many sales next month?
Our bear classifier is classification: We predict one of three categories (black, grizzly, teddy)

Classification vs Regression: Same Data, Different Questions

Sample Data: People

Name Age Height (cm) Occupation
Alice 28 165 Engineer
Bob 35 180 Teacher
Charlie 42 175 Doctor
Diana 31 170 Engineer

Classification Question

"Given age and height, predict occupation"

Output: Engineer, Teacher, or Doctor

Regression Question

"Given height and occupation, predict age"

Output: A number like 33.5 years

Our Task: Bear Image Classification

Sample Data Structure

Image File Label (Category)
bear_001.jpg black
bear_002.jpg grizzly
bear_003.jpg teddy
bear_004.jpg ???

Task: Given images 001-003 with labels, predict the label for image 004

Type: Supervised Learning (we have labels)

Sub-type: Classification (predicting a category)

Categories: 3 options (black, grizzly, teddy)

How Does Deep Learning Work?

Deep learning uses artificial neural networks - layers of connected processing units.

INPUT LAYER
Receives the data (e.g., bear image as numbers)
HIDDEN LAYERS
Extract features and patterns
(edges, textures, shapes, objects)
OUTPUT LAYER
Makes prediction (black, grizzly, or teddy)
"Deep" means many layers. More layers = can learn more complex patterns.

Data Flow: Image → Prediction

Step 1: Input (Image as Numbers)

Images are converted to numbers (pixels). Simplified 4×4 grayscale example:

245240238250
2306570235
2256068230
235220225240

Each number = brightness (0=black, 255=white)

Real bear images: 224 × 224 pixels × 3 colors = 150,528 numbers!

Step 2: Processing Through Layers

Each layer transforms the data:

Layer 1
Detects edges
(horizontal, vertical)
Layer 2
Combines edges
into shapes
Layer 3
Recognizes
parts (ear, nose)

Each layer learns automatically during training

  • Early layers: simple patterns (edges, colors)
  • Middle layers: textures and parts
  • Later layers: complete objects
We don't program these features. The network learns them from examples!

Step 3: Output (Prediction)

Final layer produces probabilities for each category:

Category Probability Percentage
Black Bear 0.05 5%
Grizzly Bear 0.88 88%
Teddy Bear 0.07 7%

Prediction: Grizzly Bear (highest probability)

Confidence: 88% certain

Probabilities always sum to 1.0 (or 100%). The model is saying "I'm 88% sure this is a grizzly."

How Training Works: Learning from Mistakes

Training Example

Image True Label Model Prediction Correct?
bear_001.jpg black grizzly (70%) ❌ Wrong
bear_002.jpg grizzly grizzly (85%) ✓ Correct
bear_003.jpg teddy teddy (95%) ✓ Correct
1. Model makes predictions
2. Compare to true labels (measure error)
3. Adjust network to reduce error
4. Repeat thousands of times

Key Concept: Tensors

A tensor is just a multi-dimensional array of numbers.

1D Tensor (Vector)

[5, 10, 15, 20]

Example: List of temperatures

2D Tensor (Matrix)

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]

Example: Grayscale image

3D Tensor: Color Image

224 pixels (height) × 224 pixels (width) × 3 colors (RGB) = 150,528 numbers in one image

This is what the neural network actually processes!

SLO1: Understanding tensors is fundamental to deep learning. Everything is numbers!

Supervised Learning: Our Approach

We will use supervised learning: teaching the computer using labeled examples.

Input: Bear image

Label: "This is a grizzly bear"

Model learns: What makes this a grizzly?

Result: Can identify grizzlies in new photos

Real Example from Today

We will give the computer:

  • 150 bear images (50 black, 50 grizzly, 50 teddy)
  • Each image labeled with correct bear type
  • Computer learns patterns that distinguish them

Key Vocabulary

Dataset

The collection of examples we use to teach the computer. For us: 150 bear images.

Model

The "brain" that learns from data. It's a neural network that can make predictions.

Training

The process where the model learns from examples. Like a student studying before an exam.

Prediction

When the model looks at a new image and tells us what it thinks the bear type is.

Epoch

One complete pass through all training data. Training for 4 epochs = seeing all images 4 times.

Analogy: Training is like studying for an exam. The dataset is your textbook. The model is your brain. Prediction is answering exam questions.

Training vs Testing: Why Split?

We divide our dataset into two parts:

Training Set (80%)

The model learns from these examples.

Like: Study materials before exam

Testing/Validation Set (20%)

The model is evaluated on these. It has never seen them before.

Like: The actual exam questions

Why Do This?

We want to know if the model truly learned, or just memorized.

Bad Scenario (No Test Set)

Student memorizes exact textbook answers. Gets 100% on questions from textbook. But fails when asked slightly different questions.

Good Scenario (With Test Set)

Student understands concepts. Can answer new questions they've never seen. This is real learning.

Manual Calculation: Dataset Split

Let's practice with our bear dataset.

Total images: 150 Split ratio: 80% training, 20% testing Training images: 150 × 0.80 = 120 images Testing images: 150 × 0.20 = 30 images

Per Category

Black bears: 50 total → 40 training + 10 testing Grizzly bears: 50 total → 40 training + 10 testing Teddy bears: 50 total → 40 training + 10 testing
Key Point: The model will learn from 120 images, then prove it learned by correctly identifying 30 images it has never seen.

How Do We Know It Works?

Accuracy

Percentage of correct predictions.

Accuracy = (Correct predictions) / (Total predictions) Example: Model tested on 30 images Gets 27 correct Accuracy = 27 / 30 = 0.90 = 90%

Error Rate

Percentage of incorrect predictions.

Error Rate = 1 - Accuracy Example: 90% accuracy Error Rate = 1 - 0.90 = 0.10 = 10%
FastAI uses error_rate as the main metric. Lower is better.

Practice: Calculate Error Rate

Scenario 1

Model tested on 40 images, gets 36 correct.

What is the error rate?

Correct = 36 Total = 40 Accuracy = 36/40 = 0.90 Error rate = 1 - 0.90 = 0.10 = 10%

Scenario 2

Model tested on 50 images, gets 45 correct.

What is the error rate?

Correct = 45 Total = 50 Accuracy = 45/50 = 0.90 Error rate = 1 - 0.90 = 0.10 = 10%

Confusion Matrix: Where Are the Mistakes?

Error rate tells us HOW MANY mistakes. Confusion matrix tells us WHICH mistakes.

Simple Example

Predicted: Black Predicted: Grizzly
Actual: Black 8 2
Actual: Grizzly 1 9

Reading This Table

Why Different Mistakes Matter

Acceptable Mistake

Confusing black bear ↔ grizzly

Both are real bears, can look similar in photos

Problematic Mistake

Confusing teddy bear → real bear

Very different! Model has serious problem.

Key Insight: Two models with same error rate (10%) can perform very differently. One might make reasonable mistakes, the other makes dangerous mistakes.

This connects to SLO4: Data Ethics - we need to understand WHERE the model fails, not just how often.

Data Augmentation

Problem: We only have 150 images. Can we get more without downloading?

Yes! Create Variations

Take one bear image and create variations:

  • Rotate it slightly
  • Flip it horizontally
  • Crop different parts
  • Adjust brightness
  • Change colors slightly

Result: From 120 training images, we can generate thousands of variations.

Data augmentation = Creating realistic variations of existing data to help the model learn better

RandomResizedCrop Strategy

Special augmentation technique that crops random portions of the image.

Epoch 1: Crop top-left corner of bear image

Epoch 2: Crop center of same bear image

Epoch 3: Crop bottom-right of same bear image

Result: Model learns to recognize bears from any angle/crop

Each training cycle (epoch) shows the model a DIFFERENT crop of the same image. This prevents memorization and improves learning.

Now Let's See the Code

Everything we just learned appears in FastAI. Let's connect concepts to code.

# CREATING THE DATASET bears = DataBlock( blocks=(ImageBlock, CategoryBlock), # Image input, Category output get_items=get_image_files, splitter=RandomSplitter(valid_pct=0.2), # 80/20 split! get_y=parent_label, # Get label from folder name item_tfms=RandomResizedCrop(224, min_scale=0.5), # Augmentation! batch_tfms=aug_transforms() # More augmentation! ) # CREATING TRAINING/TESTING SPLIT dls = bears.dataloaders(path, bs=32) # bs = batch size

Can you identify which line creates the 80/20 split?

Training the Model

# BUILD THE MODEL (using pre-trained ResNet18) learn = vision_learner(dls, resnet18, metrics=error_rate) # TRAIN THE MODEL (4 epochs = 4 complete passes through data) learn.fine_tune(4)

What Happens During Training?

fine_tune(4) means "train for 4 complete cycles through all the data"

Reading FastAI Training Output

Epoch Training Loss Validation Loss Error Rate
0 0.89 0.45 0.15
1 0.52 0.31 0.10
2 0.38 0.25 0.08
3 0.29 0.23 0.07

What We See

Evaluating Our Model

FastAI provides tools to understand where the model fails.

# CREATE INTERPRETATION OBJECT interp = ClassificationInterpretation.from_learner(learn) # SHOW CONFUSION MATRIX interp.plot_confusion_matrix() # SHOW WORST PREDICTIONS interp.plot_top_losses(9, nrows=3)

What These Show

Counter-Intuitive Approach

Train FIRST, clean data SECOND

Why?

The model finds bad data faster than you can by manual inspection.

1. Download images (some will be bad)

2. Train quick model (don't clean first!)

3. Look at top losses (model shows you bad images)

4. Clean those specific images

5. Retrain on clean data

Interactive Data Cleaning

# LAUNCH CLEANING GUI cleaner = ImageClassifierCleaner(learn) cleaner # APPLY YOUR CHANGES for idx in cleaner.delete(): cleaner.fns[idx].unlink() for idx,cat in cleaner.change(): shutil.move(str(cleaner.fns[idx]), path/cat)

What the GUI Does

What is Batch Size?

Models don't look at one image at a time. They look at groups (batches).

Example: bs=32

With 120 training images and batch size 32:

Number of batches = 120 / 32 = 3.75 ≈ 4 batches Batch 1: Images 1-32 Batch 2: Images 33-64 Batch 3: Images 65-96 Batch 4: Images 97-120 (only 24 images)

Why Batches?

Why ResNet18?

We use resnet18 - a model that already knows about images.

Training from Scratch

Model knows nothing about images.

Needs to learn: What is an edge? A curve? Fur? An eye?

Time: Many hours, thousands of images

Transfer Learning (ResNet18)

Model already knows about images.

Already learned: edges, textures, shapes, objects

Time: Minutes, hundreds of images

ResNet18 was trained on 1.2 million images. We just teach it the specific task: "Which type of bear?"

We'll learn more about this next week.

Now: Hands-On Workshop

Part 1: Run the Notebook (40 minutes)

Part 2: Experimentation (40 minutes)

Focus on UNDERSTANDING what happens, not just running code. Ask questions!

Check Your Understanding

After today, you should be able to:

Skill Can I do this?
Distinguish supervised vs unsupervised learning
Distinguish classification vs regression
Explain how data flows through neural network
Describe training and testing split
Calculate error rate from results
Run and interpret FastAI classifier

How Today Connects to Subject Learning Outcomes

SLO1: Core Concepts

SLO2: Build and Train Models

SLO4: Data Ethics

Next Steps

Today's Workshop

This Week's Work

Next Week

Key Takeaways

1. Supervised learning uses labeled examples; classification predicts categories, regression predicts numbers
2. Deep learning uses layers of neurons to transform data: input → hidden layers → output
3. Images are tensors (multi-dimensional arrays of numbers)
4. Always split data: Train on 80%, test on 20%
5. Confusion matrix shows WHERE mistakes happen, not just how many
6. Train first, then clean data (counter-intuitive but effective!)
1 / 35