Week 2: Deep Learning Fundamentals

CP3501 - Deep Learning

Building Your First Image Classifier

Today's Goals

Understand machine learning types and tasks
Learn how deep learning processes data
Build a bear classifier using FastAI
Evaluate and improve the model

Important: We will learn CONCEPTS first with sample data, then see the code. Don't worry if you're new to this - we'll build from zero.

What is Machine Learning?

Traditional Programming

We write explicit rules:

If it has round ears → teddy bear
If it has a hump → grizzly
If it's all black → black bear

Problem: Rules break easily. What if lighting changes? What if bear is partially hidden?

Machine Learning

We show examples:

Here are 100 teddy bear photos
Here are 100 grizzly photos
Here are 100 black bear photos

Benefit: Computer learns patterns automatically, handles variations better.

Machine Learning = Teaching by examples, not by writing rules

Two Main Types of Machine Learning

Supervised Learning

We provide labeled examples

Like a teacher giving answers

Example:

Image + "This is a grizzly bear"

Image + "This is a black bear"

Result: Model learns to predict labels for new data

Unsupervised Learning

We provide unlabeled examples

Like letting students discover patterns

Example:

Many images (no labels)

Model finds groups/patterns

Result: Model discovers structure in data

This course focuses on supervised learning. Our bear classifier uses labeled images.

Two Types of Supervised Learning

Classification

Predict a category

Output: One of several discrete options

Examples:

Is this email spam or not spam?
Which bear type is this?
Is this tumor benign or malignant?
What digit is in this image?

Regression

Predict a number

Output: A continuous value

Examples:

What will house price be?
How old is this person?
What temperature tomorrow?
How many sales next month?

Our bear classifier is classification: We predict one of three categories (black, grizzly, teddy)

Classification vs Regression: Same Data, Different Questions

Sample Data: People

Name	Age	Height (cm)	Occupation
Alice	28	165	Engineer
Bob	35	180	Teacher
Charlie	42	175	Doctor
Diana	31	170	Engineer

Classification Question

"Given age and height, predict occupation"

Output: Engineer, Teacher, or Doctor

Regression Question

"Given height and occupation, predict age"

Output: A number like 33.5 years

Our Task: Bear Image Classification

Sample Data Structure

Image File	Label (Category)
bear_001.jpg	black
bear_002.jpg	grizzly
bear_003.jpg	teddy
bear_004.jpg	???

Task: Given images 001-003 with labels, predict the label for image 004

Type: Supervised Learning (we have labels)

Sub-type: Classification (predicting a category)

Categories: 3 options (black, grizzly, teddy)

How Does Deep Learning Work?

Deep learning uses artificial neural networks - layers of connected processing units.

INPUT LAYER
Receives the data (e.g., bear image as numbers)

↓

HIDDEN LAYERS
Extract features and patterns
(edges, textures, shapes, objects)

↓

OUTPUT LAYER
Makes prediction (black, grizzly, or teddy)

"Deep" means many layers. More layers = can learn more complex patterns.

Data Flow: Image → Prediction

Step 1: Input (Image as Numbers)

Images are converted to numbers (pixels). Simplified 4×4 grayscale example:

245	240	238	250
230	65	70	235
225	60	68	230
235	220	225	240

Each number = brightness (0=black, 255=white)

Real bear images: 224 × 224 pixels × 3 colors = 150,528 numbers!

Step 2: Processing Through Layers

Each layer transforms the data:

Layer 1
Detects edges
(horizontal, vertical)

→

Layer 2
Combines edges
into shapes

→

Layer 3
Recognizes
parts (ear, nose)

Each layer learns automatically during training

Early layers: simple patterns (edges, colors)
Middle layers: textures and parts
Later layers: complete objects

We don't program these features. The network learns them from examples!

Step 3: Output (Prediction)

Final layer produces probabilities for each category:

Category	Probability	Percentage
Black Bear	0.05	5%
Grizzly Bear	0.88	88%
Teddy Bear	0.07	7%

Prediction: Grizzly Bear (highest probability)

Confidence: 88% certain

Probabilities always sum to 1.0 (or 100%). The model is saying "I'm 88% sure this is a grizzly."

How Training Works: Learning from Mistakes

Training Example

Image	True Label	Model Prediction	Correct?
bear_001.jpg	black	grizzly (70%)	❌ Wrong
bear_002.jpg	grizzly	grizzly (85%)	✓ Correct
bear_003.jpg	teddy	teddy (95%)	✓ Correct

1. Model makes predictions

→

2. Compare to true labels (measure error)

→

3. Adjust network to reduce error

→

4. Repeat thousands of times

Key Concept: Tensors

A tensor is just a multi-dimensional array of numbers.

1D Tensor (Vector)

[5, 10, 15, 20]

Example: List of temperatures

2D Tensor (Matrix)

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]

Example: Grayscale image

3D Tensor: Color Image

224 pixels (height) × 224 pixels (width) × 3 colors (RGB) = 150,528 numbers in one image

This is what the neural network actually processes!

SLO1: Understanding tensors is fundamental to deep learning. Everything is numbers!

Supervised Learning: Our Approach

We will use supervised learning: teaching the computer using labeled examples.

Input: Bear image

↓

Label: "This is a grizzly bear"

↓

Model learns: What makes this a grizzly?

↓

Result: Can identify grizzlies in new photos

Real Example from Today

We will give the computer:

150 bear images (50 black, 50 grizzly, 50 teddy)
Each image labeled with correct bear type
Computer learns patterns that distinguish them

Key Vocabulary

Dataset

The collection of examples we use to teach the computer. For us: 150 bear images.

Model

The "brain" that learns from data. It's a neural network that can make predictions.

Training

The process where the model learns from examples. Like a student studying before an exam.

Prediction

When the model looks at a new image and tells us what it thinks the bear type is.

Epoch

One complete pass through all training data. Training for 4 epochs = seeing all images 4 times.

Analogy: Training is like studying for an exam. The dataset is your textbook. The model is your brain. Prediction is answering exam questions.

Training vs Testing: Why Split?

We divide our dataset into two parts:

Training Set (80%)

The model learns from these examples.

Like: Study materials before exam

Testing/Validation Set (20%)

The model is evaluated on these. It has never seen them before.

Like: The actual exam questions

Why Do This?

We want to know if the model truly learned, or just memorized.

Bad Scenario (No Test Set)

Student memorizes exact textbook answers. Gets 100% on questions from textbook. But fails when asked slightly different questions.

Good Scenario (With Test Set)

Student understands concepts. Can answer new questions they've never seen. This is real learning.

Manual Calculation: Dataset Split

Let's practice with our bear dataset.

Total images: 150 Split ratio: 80% training, 20% testing Training images: 150 × 0.80 = 120 images Testing images: 150 × 0.20 = 30 images

Per Category

Black bears: 50 total → 40 training + 10 testing Grizzly bears: 50 total → 40 training + 10 testing Teddy bears: 50 total → 40 training + 10 testing

Key Point: The model will learn from 120 images, then prove it learned by correctly identifying 30 images it has never seen.

How Do We Know It Works?

Accuracy

Percentage of correct predictions.

Accuracy = (Correct predictions) / (Total predictions) Example: Model tested on 30 images Gets 27 correct Accuracy = 27 / 30 = 0.90 = 90%

Error Rate

Percentage of incorrect predictions.

Error Rate = 1 - Accuracy Example: 90% accuracy Error Rate = 1 - 0.90 = 0.10 = 10%

FastAI uses error_rate as the main metric. Lower is better.

Practice: Calculate Error Rate

Scenario 1

Model tested on 40 images, gets 36 correct.

What is the error rate?

Correct = 36 Total = 40 Accuracy = 36/40 = 0.90 Error rate = 1 - 0.90 = 0.10 = 10%

Scenario 2

Model tested on 50 images, gets 45 correct.

What is the error rate?

Correct = 45 Total = 50 Accuracy = 45/50 = 0.90 Error rate = 1 - 0.90 = 0.10 = 10%

Confusion Matrix: Where Are the Mistakes?

Error rate tells us HOW MANY mistakes. Confusion matrix tells us WHICH mistakes.

Simple Example

	Predicted: Black	Predicted: Grizzly
Actual: Black	8	2
Actual: Grizzly	1	9

Reading This Table

Diagonal (8 and 9): Correct predictions
Off-diagonal (2 and 1): Mistakes
The model confused 2 black bears as grizzlies
The model confused 1 grizzly as a black bear

Why Different Mistakes Matter

Acceptable Mistake

Confusing black bear ↔ grizzly

Both are real bears, can look similar in photos

Problematic Mistake

Confusing teddy bear → real bear

Very different! Model has serious problem.

Key Insight: Two models with same error rate (10%) can perform very differently. One might make reasonable mistakes, the other makes dangerous mistakes.

This connects to SLO4: Data Ethics - we need to understand WHERE the model fails, not just how often.

Data Augmentation

Problem: We only have 150 images. Can we get more without downloading?

Yes! Create Variations

Take one bear image and create variations:

Rotate it slightly
Flip it horizontally
Crop different parts
Adjust brightness
Change colors slightly

Result: From 120 training images, we can generate thousands of variations.

Data augmentation = Creating realistic variations of existing data to help the model learn better

RandomResizedCrop Strategy

Special augmentation technique that crops random portions of the image.

Epoch 1: Crop top-left corner of bear image

↓

Epoch 2: Crop center of same bear image

↓

Epoch 3: Crop bottom-right of same bear image

↓

Result: Model learns to recognize bears from any angle/crop

Each training cycle (epoch) shows the model a DIFFERENT crop of the same image. This prevents memorization and improves learning.

Now Let's See the Code

Everything we just learned appears in FastAI. Let's connect concepts to code.

# CREATING THE DATASET
bears = DataBlock(
    blocks=(ImageBlock, CategoryBlock),  # Image input, Category output
    get_items=get_image_files,
    splitter=RandomSplitter(valid_pct=0.2),  # 80/20 split!
    get_y=parent_label,  # Get label from folder name
    item_tfms=RandomResizedCrop(224, min_scale=0.5),  # Augmentation!
    batch_tfms=aug_transforms()  # More augmentation!
)

# CREATING TRAINING/TESTING SPLIT
dls = bears.dataloaders(path, bs=32)  # bs = batch size
            

Can you identify which line creates the 80/20 split?

Training the Model

# BUILD THE MODEL (using pre-trained ResNet18)
learn = vision_learner(dls, resnet18, metrics=error_rate)

# TRAIN THE MODEL (4 epochs = 4 complete passes through data)
learn.fine_tune(4)
            

What Happens During Training?

Model looks at all 120 training images
Makes predictions, checks mistakes
Adjusts itself to do better
Repeats 4 times (4 epochs)
After each epoch, tests on 30 validation images

fine_tune(4) means "train for 4 complete cycles through all the data"

Reading FastAI Training Output

Epoch	Training Loss	Validation Loss	Error Rate
0	0.89	0.45	0.15
1	0.52	0.31	0.10
2	0.38	0.25	0.08
3	0.29	0.23	0.07

What We See

Loss decreasing: Model is learning (getting better at minimizing mistakes)
Error rate decreasing: Making fewer mistakes on validation set
Epoch 3: 7% error rate = 93% accuracy

Evaluating Our Model

FastAI provides tools to understand where the model fails.

# CREATE INTERPRETATION OBJECT
interp = ClassificationInterpretation.from_learner(learn)

# SHOW CONFUSION MATRIX
interp.plot_confusion_matrix()

# SHOW WORST PREDICTIONS
interp.plot_top_losses(9, nrows=3)
            

What These Show

Confusion matrix: Which bear types get confused
Top losses: Images where model was most wrong
Both help us find bad data or real limitations

Counter-Intuitive Approach

Train FIRST, clean data SECOND

Why?

The model finds bad data faster than you can by manual inspection.

1. Download images (some will be bad)

↓

2. Train quick model (don't clean first!)

↓

3. Look at top losses (model shows you bad images)

↓

4. Clean those specific images

↓

5. Retrain on clean data

Interactive Data Cleaning

# LAUNCH CLEANING GUI
cleaner = ImageClassifierCleaner(learn)
cleaner

# APPLY YOUR CHANGES
for idx in cleaner.delete():
    cleaner.fns[idx].unlink()

for idx,cat in cleaner.change():
    shutil.move(str(cleaner.fns[idx]), path/cat)
            

What the GUI Does

Shows images ordered by loss (worst first)
You can mark images for deletion
You can move images to different categories
Code above applies your decisions

What is Batch Size?

Models don't look at one image at a time. They look at groups (batches).

Example: bs=32

With 120 training images and batch size 32:

Number of batches = 120 / 32 = 3.75 ≈ 4 batches Batch 1: Images 1-32 Batch 2: Images 33-64 Batch 3: Images 65-96 Batch 4: Images 97-120 (only 24 images)

Why Batches?

More efficient for GPU processing
Helps model learn better (sees multiple examples at once)
Typical values: 16, 32, 64

Why ResNet18?

We use resnet18 - a model that already knows about images.

Training from Scratch

Model knows nothing about images.

Needs to learn: What is an edge? A curve? Fur? An eye?

Time: Many hours, thousands of images

Transfer Learning (ResNet18)

Model already knows about images.

Already learned: edges, textures, shapes, objects

Time: Minutes, hundreds of images

ResNet18 was trained on 1.2 million images. We just teach it the specific task: "Which type of bear?"

We'll learn more about this next week.

Now: Hands-On Workshop

Part 1: Run the Notebook (40 minutes)

Open the FastAI notebook
Run each cell and observe outputs
Record: How many training images? Error rate?
Look at confusion matrix and top losses

Part 2: Experimentation (40 minutes)

Change number of epochs (try 1, try 8)
Use data cleaning tool
Fill in the results table

Focus on UNDERSTANDING what happens, not just running code. Ask questions!

Check Your Understanding

After today, you should be able to:

Skill	Can I do this?
Distinguish supervised vs unsupervised learning
Distinguish classification vs regression
Explain how data flows through neural network
Describe training and testing split
Calculate error rate from results
Run and interpret FastAI classifier

How Today Connects to Subject Learning Outcomes

SLO1: Core Concepts

Tensors: Images as multi-dimensional arrays
Loss functions: How we measure mistakes
Optimization: Model improving through training
Transfer learning: Using ResNet18

SLO2: Build and Train Models

Dataset preparation: Bears with correct labels
Augmentations: RandomResizedCrop, aug_transforms
Metrics: error_rate for evaluation
Validation strategies: 80/20 split

SLO4: Data Ethics

Understanding where models fail
Data quality matters (cleaning)
Different mistakes have different consequences

Next Steps

Today's Workshop

Complete the hands-on notebook
Fill in the experimentation table
Clean your data and retrain

This Week's Work

Complete the worksheet (calculations and concepts)
Try training on your own dataset (optional)
Review the glossary of terms

Next Week

Deeper dive into tensors and operations
How does gradient descent work?
Understanding neural network architectures

Key Takeaways

1. Supervised learning uses labeled examples; classification predicts categories, regression predicts numbers

2. Deep learning uses layers of neurons to transform data: input → hidden layers → output

3. Images are tensors (multi-dimensional arrays of numbers)

4. Always split data: Train on 80%, test on 20%

5. Confusion matrix shows WHERE mistakes happen, not just how many

6. Train first, then clean data (counter-intuitive but effective!)