Week 2: Deep Learning Fundamentals
CP3501 - Deep Learning
Building Your First Image Classifier
Today's Goals
- Understand machine learning types and tasks
- Learn how deep learning processes data
- Build a bear classifier using FastAI
- Evaluate and improve the model
Important: We will learn CONCEPTS first with sample data, then see the code. Don't worry if you're new to this - we'll build from zero.
What is Machine Learning?
Traditional Programming
We write explicit rules:
- If it has round ears → teddy bear
- If it has a hump → grizzly
- If it's all black → black bear
Problem: Rules break easily. What if lighting changes? What if bear is partially hidden?
Machine Learning
We show examples:
- Here are 100 teddy bear photos
- Here are 100 grizzly photos
- Here are 100 black bear photos
Benefit: Computer learns patterns automatically, handles variations better.
Machine Learning = Teaching by examples, not by writing rules
Two Main Types of Machine Learning
Supervised Learning
We provide labeled examples
Like a teacher giving answers
Example:
Image + "This is a grizzly bear"
Image + "This is a black bear"
Result: Model learns to predict labels for new data
Unsupervised Learning
We provide unlabeled examples
Like letting students discover patterns
Example:
Many images (no labels)
Model finds groups/patterns
Result: Model discovers structure in data
This course focuses on supervised learning. Our bear classifier uses labeled images.
Two Types of Supervised Learning
Classification
Predict a category
Output: One of several discrete options
Examples:
- Is this email spam or not spam?
- Which bear type is this?
- Is this tumor benign or malignant?
- What digit is in this image?
Regression
Predict a number
Output: A continuous value
Examples:
- What will house price be?
- How old is this person?
- What temperature tomorrow?
- How many sales next month?
Our bear classifier is classification: We predict one of three categories (black, grizzly, teddy)
Classification vs Regression: Same Data, Different Questions
Sample Data: People
| Name |
Age |
Height (cm) |
Occupation |
| Alice |
28 |
165 |
Engineer |
| Bob |
35 |
180 |
Teacher |
| Charlie |
42 |
175 |
Doctor |
| Diana |
31 |
170 |
Engineer |
Classification Question
"Given age and height, predict occupation"
Output: Engineer, Teacher, or Doctor
Regression Question
"Given height and occupation, predict age"
Output: A number like 33.5 years
Our Task: Bear Image Classification
Sample Data Structure
| Image File |
Label (Category) |
| bear_001.jpg |
black |
| bear_002.jpg |
grizzly |
| bear_003.jpg |
teddy |
| bear_004.jpg |
??? |
Task: Given images 001-003 with labels, predict the label for image 004
Type: Supervised Learning (we have labels)
Sub-type: Classification (predicting a category)
Categories: 3 options (black, grizzly, teddy)
How Does Deep Learning Work?
Deep learning uses artificial neural networks - layers of connected processing units.
INPUT LAYER
Receives the data (e.g., bear image as numbers)
↓
HIDDEN LAYERS
Extract features and patterns
(edges, textures, shapes, objects)
↓
OUTPUT LAYER
Makes prediction (black, grizzly, or teddy)
"Deep" means many layers. More layers = can learn more complex patterns.
Data Flow: Image → Prediction
Step 1: Input (Image as Numbers)
Images are converted to numbers (pixels). Simplified 4×4 grayscale example:
| 245 | 240 | 238 | 250 |
| 230 | 65 | 70 | 235 |
| 225 | 60 | 68 | 230 |
| 235 | 220 | 225 | 240 |
Each number = brightness (0=black, 255=white)
Real bear images: 224 × 224 pixels × 3 colors = 150,528 numbers!
Step 2: Processing Through Layers
Each layer transforms the data:
Layer 1
Detects edges
(horizontal, vertical)
→
Layer 2
Combines edges
into shapes
→
Layer 3
Recognizes
parts (ear, nose)
Each layer learns automatically during training
- Early layers: simple patterns (edges, colors)
- Middle layers: textures and parts
- Later layers: complete objects
We don't program these features. The network learns them from examples!
Step 3: Output (Prediction)
Final layer produces probabilities for each category:
| Category |
Probability |
Percentage |
| Black Bear |
0.05 |
5% |
| Grizzly Bear |
0.88 |
88% |
| Teddy Bear |
0.07 |
7% |
Prediction: Grizzly Bear (highest probability)
Confidence: 88% certain
Probabilities always sum to 1.0 (or 100%). The model is saying "I'm 88% sure this is a grizzly."
How Training Works: Learning from Mistakes
Training Example
| Image |
True Label |
Model Prediction |
Correct? |
| bear_001.jpg |
black |
grizzly (70%) |
❌ Wrong |
| bear_002.jpg |
grizzly |
grizzly (85%) |
✓ Correct |
| bear_003.jpg |
teddy |
teddy (95%) |
✓ Correct |
1. Model makes predictions
→
2. Compare to true labels (measure error)
→
3. Adjust network to reduce error
→
4. Repeat thousands of times
Key Concept: Tensors
A tensor is just a multi-dimensional array of numbers.
1D Tensor (Vector)
[5, 10, 15, 20]
Example: List of temperatures
2D Tensor (Matrix)
[[1, 2, 3],
[4, 5, 6],
[7, 8, 9]]
Example: Grayscale image
3D Tensor: Color Image
224 pixels (height) × 224 pixels (width) × 3 colors (RGB)
= 150,528 numbers in one image
This is what the neural network actually processes!
SLO1: Understanding tensors is fundamental to deep learning. Everything is numbers!
Supervised Learning: Our Approach
We will use supervised learning: teaching the computer using labeled examples.
Input: Bear image
↓
Label: "This is a grizzly bear"
↓
Model learns: What makes this a grizzly?
↓
Result: Can identify grizzlies in new photos
Real Example from Today
We will give the computer:
- 150 bear images (50 black, 50 grizzly, 50 teddy)
- Each image labeled with correct bear type
- Computer learns patterns that distinguish them
Key Vocabulary
Dataset
The collection of examples we use to teach the computer. For us: 150 bear images.
Model
The "brain" that learns from data. It's a neural network that can make predictions.
Training
The process where the model learns from examples. Like a student studying before an exam.
Prediction
When the model looks at a new image and tells us what it thinks the bear type is.
Epoch
One complete pass through all training data. Training for 4 epochs = seeing all images 4 times.
Analogy: Training is like studying for an exam. The dataset is your textbook. The model is your brain. Prediction is answering exam questions.
Training vs Testing: Why Split?
We divide our dataset into two parts:
Training Set (80%)
The model learns from these examples.
Like: Study materials before exam
Testing/Validation Set (20%)
The model is evaluated on these. It has never seen them before.
Like: The actual exam questions
Why Do This?
We want to know if the model truly learned, or just memorized.
Bad Scenario (No Test Set)
Student memorizes exact textbook answers. Gets 100% on questions from textbook. But fails when asked slightly different questions.
Good Scenario (With Test Set)
Student understands concepts. Can answer new questions they've never seen. This is real learning.
Manual Calculation: Dataset Split
Let's practice with our bear dataset.
Total images: 150
Split ratio: 80% training, 20% testing
Training images: 150 × 0.80 = 120 images
Testing images: 150 × 0.20 = 30 images
Per Category
Black bears: 50 total → 40 training + 10 testing
Grizzly bears: 50 total → 40 training + 10 testing
Teddy bears: 50 total → 40 training + 10 testing
Key Point: The model will learn from 120 images, then prove it learned by correctly identifying 30 images it has never seen.
How Do We Know It Works?
Accuracy
Percentage of correct predictions.
Accuracy = (Correct predictions) / (Total predictions)
Example: Model tested on 30 images
Gets 27 correct
Accuracy = 27 / 30 = 0.90 = 90%
Error Rate
Percentage of incorrect predictions.
Error Rate = 1 - Accuracy
Example: 90% accuracy
Error Rate = 1 - 0.90 = 0.10 = 10%
FastAI uses error_rate as the main metric. Lower is better.
Practice: Calculate Error Rate
Scenario 1
Model tested on 40 images, gets 36 correct.
What is the error rate?
Correct = 36
Total = 40
Accuracy = 36/40 = 0.90
Error rate = 1 - 0.90 = 0.10 = 10%
Scenario 2
Model tested on 50 images, gets 45 correct.
What is the error rate?
Correct = 45
Total = 50
Accuracy = 45/50 = 0.90
Error rate = 1 - 0.90 = 0.10 = 10%
Confusion Matrix: Where Are the Mistakes?
Error rate tells us HOW MANY mistakes. Confusion matrix tells us WHICH mistakes.
Simple Example
|
Predicted: Black |
Predicted: Grizzly |
| Actual: Black |
8 |
2 |
| Actual: Grizzly |
1 |
9 |
Reading This Table
- Diagonal (8 and 9): Correct predictions
- Off-diagonal (2 and 1): Mistakes
- The model confused 2 black bears as grizzlies
- The model confused 1 grizzly as a black bear
Why Different Mistakes Matter
Acceptable Mistake
Confusing black bear ↔ grizzly
Both are real bears, can look similar in photos
Problematic Mistake
Confusing teddy bear → real bear
Very different! Model has serious problem.
Key Insight: Two models with same error rate (10%) can perform very differently. One might make reasonable mistakes, the other makes dangerous mistakes.
This connects to SLO4: Data Ethics - we need to understand WHERE the model fails, not just how often.
Data Augmentation
Problem: We only have 150 images. Can we get more without downloading?
Yes! Create Variations
Take one bear image and create variations:
- Rotate it slightly
- Flip it horizontally
- Crop different parts
- Adjust brightness
- Change colors slightly
Result: From 120 training images, we can generate thousands of variations.
Data augmentation = Creating realistic variations of existing data to help the model learn better
RandomResizedCrop Strategy
Special augmentation technique that crops random portions of the image.
Epoch 1: Crop top-left corner of bear image
↓
Epoch 2: Crop center of same bear image
↓
Epoch 3: Crop bottom-right of same bear image
↓
Result: Model learns to recognize bears from any angle/crop
Each training cycle (epoch) shows the model a DIFFERENT crop of the same image. This prevents memorization and improves learning.
Now Let's See the Code
Everything we just learned appears in FastAI. Let's connect concepts to code.
bears = DataBlock(
blocks=(ImageBlock, CategoryBlock),
get_items=get_image_files,
splitter=RandomSplitter(valid_pct=0.2),
get_y=parent_label,
item_tfms=RandomResizedCrop(224, min_scale=0.5),
batch_tfms=aug_transforms()
)
dls = bears.dataloaders(path, bs=32)
Can you identify which line creates the 80/20 split?
Training the Model
learn = vision_learner(dls, resnet18, metrics=error_rate)
learn.fine_tune(4)
What Happens During Training?
- Model looks at all 120 training images
- Makes predictions, checks mistakes
- Adjusts itself to do better
- Repeats 4 times (4 epochs)
- After each epoch, tests on 30 validation images
fine_tune(4) means "train for 4 complete cycles through all the data"
Reading FastAI Training Output
| Epoch |
Training Loss |
Validation Loss |
Error Rate |
| 0 |
0.89 |
0.45 |
0.15 |
| 1 |
0.52 |
0.31 |
0.10 |
| 2 |
0.38 |
0.25 |
0.08 |
| 3 |
0.29 |
0.23 |
0.07 |
What We See
- Loss decreasing: Model is learning (getting better at minimizing mistakes)
- Error rate decreasing: Making fewer mistakes on validation set
- Epoch 3: 7% error rate = 93% accuracy
Evaluating Our Model
FastAI provides tools to understand where the model fails.
interp = ClassificationInterpretation.from_learner(learn)
interp.plot_confusion_matrix()
interp.plot_top_losses(9, nrows=3)
What These Show
- Confusion matrix: Which bear types get confused
- Top losses: Images where model was most wrong
- Both help us find bad data or real limitations
Counter-Intuitive Approach
Train FIRST, clean data SECOND
Why?
The model finds bad data faster than you can by manual inspection.
1. Download images (some will be bad)
↓
2. Train quick model (don't clean first!)
↓
3. Look at top losses (model shows you bad images)
↓
4. Clean those specific images
↓
5. Retrain on clean data
Interactive Data Cleaning
cleaner = ImageClassifierCleaner(learn)
cleaner
for idx in cleaner.delete():
cleaner.fns[idx].unlink()
for idx,cat in cleaner.change():
shutil.move(str(cleaner.fns[idx]), path/cat)
What the GUI Does
- Shows images ordered by loss (worst first)
- You can mark images for deletion
- You can move images to different categories
- Code above applies your decisions
What is Batch Size?
Models don't look at one image at a time. They look at groups (batches).
Example: bs=32
With 120 training images and batch size 32:
Number of batches = 120 / 32 = 3.75 ≈ 4 batches
Batch 1: Images 1-32
Batch 2: Images 33-64
Batch 3: Images 65-96
Batch 4: Images 97-120 (only 24 images)
Why Batches?
- More efficient for GPU processing
- Helps model learn better (sees multiple examples at once)
- Typical values: 16, 32, 64
Why ResNet18?
We use resnet18 - a model that already knows about images.
Training from Scratch
Model knows nothing about images.
Needs to learn: What is an edge? A curve? Fur? An eye?
Time: Many hours, thousands of images
Transfer Learning (ResNet18)
Model already knows about images.
Already learned: edges, textures, shapes, objects
Time: Minutes, hundreds of images
ResNet18 was trained on 1.2 million images. We just teach it the specific task: "Which type of bear?"
We'll learn more about this next week.
Now: Hands-On Workshop
Part 1: Run the Notebook (40 minutes)
- Open the FastAI notebook
- Run each cell and observe outputs
- Record: How many training images? Error rate?
- Look at confusion matrix and top losses
Part 2: Experimentation (40 minutes)
- Change number of epochs (try 1, try 8)
- Use data cleaning tool
- Fill in the results table
Focus on UNDERSTANDING what happens, not just running code. Ask questions!
Check Your Understanding
After today, you should be able to:
| Skill |
Can I do this? |
| Distinguish supervised vs unsupervised learning |
|
| Distinguish classification vs regression |
|
| Explain how data flows through neural network |
|
| Describe training and testing split |
|
| Calculate error rate from results |
|
| Run and interpret FastAI classifier |
|
How Today Connects to Subject Learning Outcomes
SLO1: Core Concepts
- Tensors: Images as multi-dimensional arrays
- Loss functions: How we measure mistakes
- Optimization: Model improving through training
- Transfer learning: Using ResNet18
SLO2: Build and Train Models
- Dataset preparation: Bears with correct labels
- Augmentations: RandomResizedCrop, aug_transforms
- Metrics: error_rate for evaluation
- Validation strategies: 80/20 split
SLO4: Data Ethics
- Understanding where models fail
- Data quality matters (cleaning)
- Different mistakes have different consequences
Next Steps
Today's Workshop
- Complete the hands-on notebook
- Fill in the experimentation table
- Clean your data and retrain
This Week's Work
- Complete the worksheet (calculations and concepts)
- Try training on your own dataset (optional)
- Review the glossary of terms
Next Week
- Deeper dive into tensors and operations
- How does gradient descent work?
- Understanding neural network architectures
Key Takeaways
1. Supervised learning uses labeled examples; classification predicts categories, regression predicts numbers
2. Deep learning uses layers of neurons to transform data: input → hidden layers → output
3. Images are tensors (multi-dimensional arrays of numbers)
4. Always split data: Train on 80%, test on 20%
5. Confusion matrix shows WHERE mistakes happen, not just how many
6. Train first, then clean data (counter-intuitive but effective!)