Convolutional Neural Networks for Image Classification

Week 8: DATA4800 - AI and Machine Learning

Understanding How Machines Learn to See

Learning Objectives

By the end of this week, you will be able to:

Explain why traditional neural networks struggle with image data
Understand the fundamental operation of convolution for pattern detection
Describe how CNNs extract hierarchical features from images
Identify the key components of CNN architecture (convolution, pooling, dense layers)
Apply pre-trained CNN models to business classification problems
Evaluate CNN performance using appropriate metrics
Recognize real-world business applications of CNN technology

Business Problem: Manufacturing Quality Control

The Challenge

A electronics manufacturer produces 50,000 circuit boards daily. Each board must be inspected for defects before shipment.

Traditional Manual Inspection

Human Inspector Performance:
• Speed: 100 boards per hour
• Accuracy: 85-90% (fatigue affects performance)
• Cost: $25 per hour × 500 inspectors = $300,000 daily
• Defect rate: 10-15% of defects missed

The Business Impact

Missed defects result in:

Product returns and warranty claims
Customer dissatisfaction and brand damage
Regulatory compliance issues
Lost revenue averaging $2M annually

CNN-Powered Solution: Automated Visual Inspection

Performance Comparison

Metric	Manual Inspection	CNN System	Improvement
Inspection Speed	100 boards/hour	10,000 boards/hour	100× faster
Accuracy	85-90%	98.5%	+10% accuracy
Consistency	Varies with fatigue	Constant 24/7	No degradation
Annual Cost	$7.5M (labor)	$500K (system)	93% cost reduction

Return on Investment: System pays for itself in 3 weeks through reduced labor costs and avoided defect costs.

Why CNNs Matter for Business

Healthcare Diagnostics

Use Case: Automated screening of medical images (X-rays, MRIs, CT scans)

Impact: Radiologists process 5× more cases with 15% higher detection rate for early-stage diseases

Value: Early detection saves lives and reduces treatment costs by 60%

Retail & E-commerce

Use Case: Automated product tagging and visual search

Impact: Catalog 100,000+ products automatically, enable customer image search

Value: 40% increase in product discovery, 25% boost in conversion rates

Agriculture

Use Case: Crop disease detection and yield prediction

Impact: Identify plant diseases 2 weeks earlier than traditional methods

Value: Prevent 30% crop loss, increase farm profitability by $150K annually

Security & Surveillance

Use Case: Facial recognition and anomaly detection

Impact: Monitor 1,000+ cameras simultaneously with real-time alerts

Value: Reduce security incidents by 70%, enable touchless access control

The Challenge: Why Traditional Neural Networks Fail for Images

Understanding Image Data

Let's examine what a computer "sees" when processing an image:

Small Image (28 × 28 pixels, grayscale):
• Total numbers: 28 × 28 = 784 pixel values
• Each pixel: Single number (0-255 for brightness)
• Network needs: 784 input neurons

Realistic Business Image (224 × 224 pixels, color):
• Total numbers: 224 × 224 × 3 (RGB channels) = 150,528 pixel values
• Each pixel: Three numbers (Red, Green, Blue values 0-255)
• Network needs: 150,528 input neurons

The Problem Scales Exponentially

If we connect these inputs to just 1,000 neurons in the first hidden layer:

150,528 inputs × 1,000 neurons = 150,528,000 parameters (just in the first layer!)

The Parameter Explosion Problem

Network Comparison

Network Type	Input Size	Hidden Layer Size	Parameters	Issues
Traditional NN	784 (28×28)	128 neurons	100,352	Manageable
Traditional NN	150,528 (224×224×3)	128 neurons	19,267,584	Severe overfitting
Traditional NN	150,528	1,000 neurons	150,528,000	Impossible to train

Why This Fails

Overfitting: Too many parameters memorize training data instead of learning general patterns
No Spatial Understanding: Network treats pixels independently, ignoring that nearby pixels form meaningful patterns
Translation Variance: Same object in different image positions looks completely different to the network
Computational Cost: Training time becomes impractical, requiring massive computing resources

Learning from Human Vision: How Do We Recognize Objects?

The Human Approach

Consider how you recognize a cat in a photograph. You don't analyze every pixel individually. Instead, you follow a hierarchical process:

Step 1: Edges

Detect basic lines and boundaries

Step 2: Shapes

Combine edges into simple geometric forms

Step 3: Parts

Identify object components (ears, eyes, whiskers)

🐱

Step 4: Object

Recognize complete object: "This is a cat"

Key Insight: CNNs mimic this hierarchical process. They start with simple patterns (edges) and progressively build up to complex concepts (whole objects).

The CNN Solution: Hierarchical Feature Learning

Three Key Innovations

1. Local Connectivity

Instead of connecting to all pixels, each neuron only examines a small region (e.g., 3×3 pixels)

Benefit: Dramatically reduces parameters from millions to thousands

2. Parameter Sharing

Use the same "filter" (pattern detector) across the entire image

Benefit: Learns to detect patterns regardless of where they appear in the image

3. Hierarchical Learning

Stack multiple layers that learn increasingly complex features

Benefit: Automatically discovers relevant patterns without manual feature engineering

The Result

A CNN with 1 million parameters can achieve what would require 150+ million parameters in a traditional neural network, while also learning better, more generalizable representations.

Knowledge Check: Neural Networks and Images

Why do traditional fully-connected neural networks struggle with image classification tasks?

A) Images contain too little information for neural networks to learn from

B) The massive number of parameters leads to overfitting and ignores spatial relationships between pixels

C) Neural networks can only process numerical data, not images

D) Images are too expensive to process with neural networks

The Convolution Operation: Core Building Block

What is Convolution?

Convolution is a mathematical operation that applies a small filter (also called a kernel) across an image to detect specific patterns.

Business Analogy

Think of convolution like a quality inspector with a checklist:

The filter is the checklist of features to look for
The sliding window is moving the checklist across every part of the product
The output shows where those features were detected

Key Concept: Instead of looking at the entire image at once (which requires millions of parameters), convolution examines small regions one at a time using the same filter, dramatically reducing computational requirements.

How Convolution Works: Step-by-Step

The Convolution Process

Example: Detecting Vertical Edges

Input Image (5×5)

0

255

0

255

0

255

0

255

0

255

Dark (0) on left, Bright (255) on right

⊗

Vertical Edge Filter (3×3)

-1

0

1

-1

0

1

-1

0

1

Detects left-to-right brightness change

→

Output (3×3)

765

0

765

0

765

0

High values = vertical edge detected

Calculation for Top-Left Position

Filter slides over image, performs element-wise multiplication and sums:

(0×-1 + 0×0 + 255×1) + (0×-1 + 0×0 + 255×1) + (0×-1 + 0×0 + 255×1) = 765

This high value indicates a strong vertical edge was detected.

Interactive Convolution Demonstration

Select Filter Type:

Input Pattern

→

Current Filter

→

Output (Feature Map)

Business Insight: Different filters detect different patterns. In quality control, edge detectors find boundaries and defects, blur filters reduce noise, and sharpening filters enhance details. CNNs automatically learn the optimal filters for each task.

Using Multiple Filters for Comprehensive Detection

Why Multiple Filters?

A single filter can only detect one type of pattern. Real-world classification requires detecting many different features simultaneously.

Example: Circuit Board Inspection

Filter 1: Vertical Lines

Detects vertical traces and component edges

Business Value: Identifies misaligned components

Filter 2: Horizontal Lines

Detects horizontal traces and solder points

Business Value: Finds disconnected circuits

Filter 3: Circular Shapes

Detects capacitors and mounting holes

Business Value: Verifies component presence

Filter 4: Texture Patterns

Detects surface roughness and burn marks

Business Value: Identifies manufacturing defects

Typical CNN Layer: Uses 32-512 different filters simultaneously, creating a multi-dimensional representation of the image. Each filter learns to detect patterns that are useful for the classification task.

Feature Maps: The Output of Convolution

Understanding Feature Maps

When a filter slides across an image, it produces a feature map (also called an activation map) that shows where and how strongly the pattern was detected.

Transformation Through Convolutional Layer

Input Image

224 × 224 × 3

(Height × Width × Channels)

→

Feature Maps (4 shown)

224 × 224 × 64

(64 different filters applied)

Interpretation: Each feature map highlights regions where its corresponding filter detected its pattern. Bright areas = strong detection, dark areas = weak/no detection.

Dimensionality

Input: Height × Width × Color Channels (3 for RGB)

Output: Height × Width × Number of Filters

The number of filters (typically 32, 64, 128, 256, or 512) becomes the new "depth" dimension.

Knowledge Check: Convolution Fundamentals

A convolutional layer applies 64 different 3×3 filters to a 224×224 RGB image. What is the shape of the resulting feature maps (ignoring padding and stride)?

A) 224 × 224 × 3

B) 222 × 222 × 64

C) 64 × 64 × 224

D) 224 × 224 × 64

Hint: Consider that each filter produces one feature map with spatial dimensions slightly smaller than the input, and all filters are applied to the same input.

Complete CNN Architecture: Building Blocks

Three Main Components

1. Convolutional Layers

Function: Apply filters to detect patterns

Output: Feature maps showing where patterns were found

Parameters: Filter weights (learned during training)

↓

2. Pooling Layers

Function: Reduce spatial dimensions while preserving important features

Output: Downsampled feature maps

Parameters: None (fixed operation)

↓

3. Fully Connected (Dense) Layers

Function: Combine learned features to make final classification decision

Output: Class probabilities

Parameters: Connection weights (learned during training)

Design Pattern: Modern CNNs typically alternate between convolutional and pooling layers multiple times, progressively extracting more abstract features, before feeding into dense layers for final classification.

Hierarchical Feature Learning: From Edges to Objects

Progressive Abstraction Through Layers

Layer 1

Low-Level Features

Detects: Edges, lines, gradients, simple textures

Example: Horizontal/vertical boundaries, color transitions

Layer 2-3

Mid-Level Features

Detects: Corners, curves, simple shapes

Example: Circles, rectangles, T-junctions

Layer 4-5

Component outlines
Repeated patterns
Object parts
Complex textures

High-Level Features

Detects: Object parts and assemblies

Example: Wheels, faces, logos, product components

Dense Layers

✓

Complete Objects
Classification

Classification

Combines: All learned features

Output: "Defective Product" or "Quality Pass"

Key Insight: CNNs automatically learn this hierarchy without manual feature engineering. Early layers learn generic patterns useful for many tasks, while later layers learn task-specific features.

Example CNN Architecture: Product Quality Classifier

Layer-by-Layer Transformation

Layer	Operation	Output Shape	Parameters	What It Learns
Input	Product image	224 × 224 × 3	0	Raw pixel data (RGB)
Conv1	32 filters (3×3)	224 × 224 × 32	896	Basic edges and color gradients
Pool1	Max pooling (2×2)	112 × 112 × 32	0	Downsample while keeping features
Conv2	64 filters (3×3)	112 × 112 × 64	18,496	Simple shapes and corners
Pool2	Max pooling (2×2)	56 × 56 × 64	0	Further dimensionality reduction
Conv3	128 filters (3×3)	56 × 56 × 128	73,856	Component parts and patterns
Pool3	Max pooling (2×2)	28 × 28 × 128	0	Compact representation
Flatten	Reshape to vector	100,352	0	Prepare for dense layers
Dense1	128 neurons	128	12,845,184	Combine all features
Output	2 neurons (softmax)	2	258	Class probabilities: Defective/Pass

Total Parameters: ~13 million (vs. 150+ million for fully connected network)
Training Time: 2 hours on GPU vs. weeks for traditional approach
Accuracy: 98.5% vs. 75% for manual feature engineering

Pooling Layers: Efficient Dimensionality Reduction

Why Do We Need Pooling?

As we add more convolutional layers, the spatial dimensions and computational cost grow rapidly. Pooling layers address this by:

Reducing spatial dimensions: Decreases image size while preserving important features
Controlling parameters: Fewer values to process in subsequent layers
Translation invariance: Small shifts in input don't drastically change output
Computational efficiency: Faster training and inference

Business Analogy

Think of pooling like creating a executive summary from a detailed report. You preserve the key findings and critical information while reducing the overall document size by 75%. The executive doesn't need every data point—just the most significant ones.

Common Pooling Operations

Max Pooling: Takes the maximum value in each region (most common)
Average Pooling: Takes the average value in each region
Typical window size: 2×2 with stride 2 (reduces dimensions by 50%)

Max Pooling: Visual Demonstration

Example: 2×2 Max Pooling with Stride 2

Input Feature Map (4×4)

12

20

5

8

34

15

22

3

7

42

18

6

11

9

28

Pink regions: 2×2 windows

→

Output After Max Pooling (2×2)

34

22

42

28

Maximum from each 2×2 region

Step-by-Step Calculation

Top-Left Region

Values: 12, 20, 8, 34

Max: 34

Top-Right Region

Values: 5, 8, 15, 22

Max: 22

Bottom-Left Region

Values: 3, 7, 6, 11

Max: 42 (from corrected region)

Bottom-Right Region

Values: 42, 18, 9, 28

Max: 28 (from corrected region)

Result: Spatial dimensions reduced from 4×4 to 2×2 (75% reduction) while preserving the strongest activations (most important features detected by filters).

Why Pooling Improves CNN Performance

Key Benefits

1. Dimensionality Reduction

Impact: 2×2 pooling reduces spatial size by 75%

Business Value: Faster processing enables real-time applications (e.g., live quality control on production lines)

Example: 224×224 image → 112×112 → 56×56 → 28×28

2. Translation Invariance

Impact: Small shifts in feature location don't change output

Business Value: Product can be slightly off-center in image, model still classifies correctly

Example: Logo detected whether left, center, or right

3. Feature Selection

Impact: Keeps only the strongest activations (most confident detections)

Business Value: Focuses on most distinctive features, improves classification accuracy

Example: Retains clear defects, discards noise

4. Computational Efficiency

Impact: Reduces memory usage and processing time

Business Value: Deploy models on edge devices (mobile, embedded systems) for on-site inspection

Example: Smartphone app for field inspections

Trade-off Consideration

Pooling discards some spatial information. Modern architectures (like ResNet) use techniques like stride convolutions as alternatives, but pooling remains widely used for its simplicity and effectiveness.

Knowledge Check: Pooling Operations

A feature map of size 64×64×128 passes through a 2×2 max pooling layer with stride 2. What is the output size?

A) 32 × 32 × 128

B) 64 × 64 × 64

C) 32 × 32 × 64

D) 62 × 62 × 128

Hint: Pooling reduces spatial dimensions (height and width) but does not change the depth (number of feature maps/channels).

Putting It Together: Complete CNN Forward Pass

Data Flow Through Network

Input: Product Image

224 × 224 × 3 (RGB image)

Original photo from production line camera

↓

Convolutional Block 1

Conv Layer: 32 filters (3×3) → 224×224×32

ReLU Activation: Remove negative values

Max Pool (2×2): → 112×112×32

Learns: Basic edges, color transitions

↓

Convolutional Block 2

Conv Layer: 64 filters (3×3) → 112×112×64

ReLU Activation: Remove negative values

Max Pool (2×2): → 56×56×64

Learns: Corners, simple shapes, textures

↓

Convolutional Block 3

Conv Layer: 128 filters (3×3) → 56×56×128

ReLU Activation: Remove negative values

Max Pool (2×2): → 28×28×128

Learns: Component parts, assemblies, defect patterns

↓

Flatten Layer

28 × 28 × 128 = 100,352 values

Convert 3D tensor to 1D vector for dense layers

↓

Dense Layer

128 neurons with ReLU activation

Combines all learned features for decision-making

↓

Output Layer (Softmax)

2 neurons → 2 class probabilities

Class 0 (Defective): 2%
Class 1 (Pass): 98%

Prediction: Product Passes Quality Control ✓

Activation Functions: Introducing Non-Linearity

Why Activation Functions?

Without activation functions, stacking multiple convolutional layers would be mathematically equivalent to a single layer. Activation functions introduce non-linearity, enabling CNNs to learn complex patterns.

ReLU: The Standard Choice

ReLU (Rectified Linear Unit) is the most common activation function in CNNs.

ReLU Operation: f(x) = max(0, x)

How ReLU Works

Positive values: Pass through unchanged
Negative values: Converted to zero
Effect: Keeps strong feature activations, suppresses weak/irrelevant ones

Why ReLU?

Computationally efficient (simple comparison)
Helps prevent vanishing gradient problem
Introduces sparsity (many zeros) which improves generalization

Business Analogy: ReLU is like a quality filter that only passes signals above a threshold. Weak, noisy detections are zeroed out, while strong, confident feature detections are preserved. This makes the network focus on the most discriminative patterns.

Training CNNs: Learning from Data

Training Process Overview

CNNs learn filter weights and dense layer parameters through supervised learning using labeled training data.

1. Forward Pass

Input image flows through network to produce prediction

Example: Image → Network → Predicts "Defective" with 75% confidence

↓

2. Calculate Loss

Measure how wrong the prediction was compared to true label

Example: True label was "Pass" → Large error (prediction was wrong)

↓

3. Backpropagation

Calculate how each parameter contributed to the error

Technical: Compute gradients of loss with respect to all weights

↓

4. Update Weights

Adjust filter weights and dense layer parameters to reduce error

Goal: Improve prediction accuracy on next iteration

↓

5. Repeat

Process thousands of images over multiple epochs

Result: Filters learn to detect task-relevant patterns

Training Data Requirements: Typical CNN needs 1,000-10,000+ labeled examples per class. For quality control with 2 classes (defective/pass), need 2,000-20,000 labeled images minimum.

Transfer Learning: Leveraging Pre-Trained Models

The Challenge of Training from Scratch

Training CNNs from scratch requires massive datasets (millions of images)
Training time: Days to weeks on high-performance GPUs
Computational cost: Thousands of dollars in cloud computing
Most businesses don't have sufficient labeled data

The Solution: Transfer Learning

Core Idea: Start with a CNN already trained on millions of images (e.g., ImageNet with 1.2M images, 1,000 categories). The early layers have learned universal visual features (edges, textures, shapes) that transfer to new tasks.

How Transfer Learning Works

Step 1: Start with Pre-Trained Model

Use model trained on ImageNet (e.g., VGG16, ResNet50, EfficientNet)

Benefit: Proven feature extractors already learned

Step 2: Remove Final Layers

Discard the original classification head (1,000 ImageNet classes)

Keep: All convolutional layers (learned features)

Step 3: Add Custom Classifier

Add new dense layers for your specific task (e.g., 2 classes: defective/pass)

Initialize: Only these new layers need training

Step 4: Fine-Tune on Your Data

Train on your smaller dataset (1,000-5,000 images often sufficient)

Result: Task-specific classifier in hours instead of weeks

Transfer Learning Business Impact

Comparison: Training from Scratch vs. Transfer Learning

Factor	Training from Scratch	Transfer Learning	Advantage
Training Data Required	100,000+ images per class	500-5,000 images per class	95% reduction
Training Time	5-14 days on GPU	2-8 hours on GPU	50× faster
Computational Cost	$2,000-$5,000	$50-$200	95% cost savings
Final Accuracy	85-90% (limited data)	93-98% (pre-learned features)	+8% accuracy gain
Time to Production	3-6 months	2-4 weeks	10× faster deployment

Real-World Success Story

Medical Imaging Startup: A company developing diabetic retinopathy detection needed to classify eye images. Using transfer learning with ResNet50:

• Dataset: 3,500 labeled images (vs. 100,000+ needed from scratch)
• Training time: 6 hours (vs. estimated 2 weeks from scratch)
• Accuracy: 96.8% (exceeding ophthalmologist performance)
• Time to market: 1 month (vs. 6+ months estimated)
• Result: FDA approval and deployment to 50+ clinics

Popular Pre-Trained CNN Architectures

Leading Models for Transfer Learning

VGG16 / VGG19

Released: 2014

Depth: 16-19 layers

Parameters: 138M (VGG16)

Strengths: Simple architecture, easy to understand, excellent for teaching

Use Case: Good baseline for many tasks

ResNet50 / ResNet101

Released: 2015

Depth: 50-152 layers

Parameters: 25M (ResNet50)

Strengths: Skip connections enable very deep networks, excellent accuracy

Use Case: Industry standard for most applications

InceptionV3

Released: 2015

Depth: 48 layers

Parameters: 24M

Strengths: Multi-scale processing, efficient computation

Use Case: Balance between accuracy and speed

EfficientNet

Released: 2019

Depth: Varies (B0-B7)

Parameters: 5-66M

Strengths: State-of-art accuracy with fewer parameters, scalable

Use Case: Best for production deployment, mobile devices

Selecting the Right Architecture

For Learning: Start with VGG16 (simple, interpretable)

For Production: ResNet50 or EfficientNet (best accuracy-efficiency trade-off)

For Mobile/Edge: EfficientNet-B0 or MobileNet (optimized for constrained devices)

For Research: Latest models (e.g., EfficientNetV2, ConvNeXt)

Knowledge Check: Transfer Learning

Your company needs to classify 5 types of manufacturing defects. You have 2,000 labeled images. Which approach is most appropriate?

A) Train a CNN from scratch with random weight initialization

B) Use transfer learning with a pre-trained model like ResNet50, replacing the final layer with 5 output neurons

C) Use a traditional machine learning algorithm like logistic regression on raw pixel values

D) Manually design edge detection filters and use a decision tree

Evaluating CNN Performance

Key Metrics for Image Classification

Accuracy

Definition: Percentage of correct predictions

Formula: Correct Predictions / Total Predictions

When to Use: Balanced datasets (similar number of examples per class)

Limitation: Misleading for imbalanced data

Precision

Definition: Of all positive predictions, how many were actually positive?

Formula: True Positives / (True Positives + False Positives)

Business Meaning: "When I flag a product as defective, how often am I right?"

Critical When: False positives are costly (wasted inspection time)

Recall (Sensitivity)

Definition: Of all actual positives, how many did we detect?

Formula: True Positives / (True Positives + False Negatives)

Business Meaning: "Of all actual defects, how many did I catch?"

Critical When: False negatives are costly (defects reach customers)

F1-Score

Definition: Harmonic mean of precision and recall

Formula: 2 × (Precision × Recall) / (Precision + Recall)

Business Meaning: Balanced measure when both false positives and false negatives matter

Use: Standard metric for imbalanced classification

Business Decision Making

Example: Quality control system with 95% accuracy, 90% precision, 98% recall

Interpretation: System catches 98% of defects (high recall) but also flags some good products as defective (90% precision). This trade-off may be acceptable if manual verification is cheaper than defects reaching customers.

Common CNN Challenges and Solutions

Practical Issues in Deployment

Challenge	Cause	Solution
Overfitting	Model memorizes training data, poor generalization to new images	• Data augmentation (rotations, flips, brightness changes) • Dropout layers • More training data • Regularization techniques
Class Imbalance	Far more examples of one class than others (e.g., 95% pass, 5% defective)	• Weighted loss function • Oversample minority class • Undersample majority class • Use precision/recall instead of accuracy
Limited Training Data	Insufficient labeled examples to train effectively	• Transfer learning (primary solution) • Data augmentation • Synthetic data generation • Active learning to prioritize labeling
Computational Cost	Real-time inference requirements, limited hardware	• Model compression (pruning, quantization) • Use efficient architectures (EfficientNet, MobileNet) • Cloud-based inference • Batch processing when real-time not required
Domain Shift	Training data differs from production data (lighting, angles, quality)	• Collect data from actual production environment • Domain adaptation techniques • Regular model retraining • Data augmentation to simulate variations

CNN Applications Across Industries

Transformative Business Use Cases

Manufacturing

Applications:

Automated quality inspection
Defect classification
Surface finish analysis
Assembly verification

ROI: 70-90% labor cost reduction, 15-30% quality improvement

Healthcare

Applications:

Medical image diagnosis (X-ray, MRI, CT)
Cancer detection
Retinopathy screening
Skin lesion classification

Impact: Earlier detection, radiologist efficiency gains of 5×

Retail & E-commerce

Applications:

Visual product search
Automated tagging
Inventory monitoring
Cashierless checkout

Impact: 40% increase in product discovery, 25% conversion lift

Agriculture

Applications:

Crop disease detection
Yield prediction
Weed identification
Livestock monitoring

Impact: 30% reduction in crop loss, pesticide savings of 40%

Autonomous Vehicles

Applications:

Object detection (pedestrians, vehicles)
Lane detection
Traffic sign recognition
Obstacle classification

Impact: Foundation of self-driving technology

Security

Applications:

Facial recognition
Anomaly detection
License plate reading
Crowd analysis

Impact: 70% reduction in security incidents, real-time alerting

Week 8 Summary: CNNs for Image Classification

Key Takeaways

CNNs solve the image data challenge by using local connectivity, parameter sharing, and hierarchical learning—reducing parameters from 150M+ to under 1M while achieving superior performance
Convolution detects patterns through filters that slide across images, creating feature maps that highlight where specific patterns appear
Hierarchical learning mimics human vision by progressively building from simple features (edges) to complex concepts (objects)
Pooling reduces dimensionality efficiently while preserving critical information and enabling translation invariance
Transfer learning democratizes AI by enabling small businesses to build accurate models with 1,000s of images instead of millions, reducing training time from weeks to hours
CNNs transform industries through automated visual inspection, medical diagnosis, autonomous systems, and countless other applications

Next Week Preview

Week 9: Advanced CNN Topics
• Object detection (finding and localizing multiple objects)
• Semantic segmentation (pixel-level classification)
• Model interpretability (understanding what CNNs learn)
• Deployment strategies (edge devices, cloud, mobile)

Practical Assignment

In your lab session, you will:

Build an image classifier using Orange Data Mining
Apply transfer learning with pre-trained models
Evaluate performance using multiple metrics
Compare different CNN architectures
Deploy your model to classify new images