DATA 4800

Introduction to Neural Networks

Workshop 7

Artificial Intelligence and Machine Learning

Kaplan Business School

Today's Learning Agenda

🧠 From Machine Learning to Deep Learning

Understanding when and why we need neural networks

⚡ Artificial Neurons

Building blocks inspired by biology

🔗 Neural Network Components

Weights, biases, and activation functions

🎯 Perceptron Training

Learning from examples step by step

🌐 Multilayer Networks

Overcoming limitations with hidden layers

🔄 Backpropagation

The learning algorithm that powers neural networks

🚀 Network Architectures

Different types for different problems

When Do We Need Deep Learning?

Traditional Machine Learning

Works well for:

Structured data (tables, spreadsheets)
Linear relationships
Small datasets (< 10,000 rows)
Clear feature engineering

Example: Predicting house prices
Features: bedrooms, size, location → Price

Deep Learning Required

Essential for:

Unstructured data (images, text, audio)
Complex patterns
Large datasets (> 100,000 samples)
Automatic feature discovery

Example: Image recognition
Raw pixels → Complex patterns → Object classification

Data-Driven Evidence

ImageNet Challenge: Traditional ML accuracy ~26% → Deep Learning accuracy >95%

The Foundation: Artificial Neurons

Deep Learning = Networks of Artificial Neurons

Deep Learning = Many neurons working together to learn complex patterns

Artificial Neurons & Biological Inspiration

Biological Neuron

Dendrites: Receive signals
Cell Body: Processes information
Axon: Sends output signal
Synapses: Connection strength

Artificial Neuron

Inputs (x): Like dendrites
Weights (w): Like synaptic strength
Activation function: Like cell body processing
Output (y): Like axon signal

Neural Network Components: Weights & Inputs

Key Components

Inputs (xᵢ)

Customer data fed into the neuron

Example: Age, Income, Purchase history

Weights (wᵢ)

Importance of each input connection

Higher weight = more influence on output

Bias (b)

Adjusts the neuron's sensitivity

Shifts the decision boundary

🎯 Training Goal: Find the best weights and biases!

Activation Functions: The Decision Makers

Activation functions determine when a neuron should "fire" or activate

Step Function

Output: 0 or 1

Used in perceptrons

ReLU

f(x) = max(0, x)

Sigmoid

f(x) = 1/(1+e⁻ˣ)

Output: 0 to 1

Data-Driven Example: Customer Purchase Prediction

Input: Customer spends $500 → Weighted sum: 2.1 → ReLU(2.1) = 2.1 → Likely to purchase!

Input: Customer spends $50 → Weighted sum: -0.3 → ReLU(-0.3) = 0 → Unlikely to purchase

🧮 Exercise: Calculate Neuron Output

Given Network

Calculate for Both Activation Functions

Part A: Using ReLU Function

ReLU(x) = max(0, x)

1. Calculate weighted sum:

z = + 0.5 =

2. Apply ReLU activation:

Output = ReLU( ) =

Part B: Using Sigmoid Function

Sigmoid(x) = 1/(1 + e⁻ˣ)

1. Use the same weighted sum z =

2. Apply Sigmoid activation:

Output = 1/(1 + e⁻ ) =

Perceptron Limitation: Linear Separability

Single-layer perceptrons can only solve linearly separable problems

✅ Linearly Separable (AND Gate)

Perceptron can solve this!

❌ NOT Linearly Separable (XOR Gate)

Perceptron cannot solve this!

🚫 The XOR Problem

This limitation led to the "AI Winter" in the 1970s. Solution: Multilayer Neural Networks!

Multilayer Perceptron (MLP): Breaking the Limitation

Network Architecture

Key MLP Properties

1

Input Layer → Hidden Layer(s)

Each input connects to all hidden neurons

2

Hidden Layer(s) → Output Layer

Hidden neurons connect to all outputs

3

Feed-Forward Structure

Information flows in one direction only

4

Fully Connected

Every neuron connects to every neuron in next layer

MLP in Action: XOR Problem Solved

Watch how hidden layers enable complex pattern recognition

XOR Truth Table

X₁	X₂	Y
0	0	0
0	1	1
1	0	1
1	1	0

💡 Key Insight

Hidden layers create feature representations that make non-linear problems linearly separable!

Current Prediction:

Input: [0, 0]
Output: 0

Backpropagation: How Neural Networks Learn

The Learning Process

The Learning Challenge

The Problem:

How do we train networks with hidden layers? We can't directly see what hidden neurons should output!

The Solution:

Backpropagation - propagate errors backward through the network to update all weights.

Two-Phase Process:

1. Forward Pass: Calculate predictions

2. Backward Pass: Calculate and propagate errors

Status: Ready to start

Current Error: -

Step 1: Forward Propagation

Computing Outputs Layer by Layer

Click "Start Forward Pass" to see the calculation

Forward Pass Steps

1

Input Layer

Feed raw data into network

2

Calculate Hidden Layer

z = Σ(wᵢ × xᵢ) + b

h = activation(z)

3

Calculate Output Layer

Use hidden layer outputs as inputs

4

Get Final Prediction

Compare with expected output

Current Calculations:

Click "Start Forward Pass" to see calculations

Step 2: Calculate Error

Measuring how wrong our prediction is

Error Metrics

Mean Squared Error

E = ½(Y - Ŷ)²

Where Y = expected, Ŷ = predicted

Why Square the Error?

Always positive
Penalizes large errors more
Mathematically convenient

Interactive Error Calculation

Customer Purchase Prediction Example

Expected Output

1

Will buy

Network Prediction

0.3

30% confidence

Error Calculation:

E = ½(1 - 0.3)² = ½(0.7)² = 0.245

Goal: Adjust weights to minimize this error!

Step 3: Backward Propagation

Error Flows Backward

Understanding Gradients

What are Gradients?

Gradients tell us:

Direction: Which way to change weights
Magnitude: How much to change them

Chain Rule in Action

∂E/∂w = ∂E/∂y × ∂y/∂z × ∂z/∂w

How changing weight w affects final error E

Error Attribution

Each weight gets "blame" proportional to its contribution to the error

Gradients will appear here during animation

Step 4: Gradient Descent - The Optimization Method

Gradient Descent Visualization

Weight Update Rule

w_new = w_old - η × ∂E/∂w

Learning Rate (η)

Controls how big steps we take:

Too large: Might overshoot minimum
Too small: Learning takes forever
Just right: Steady convergence

Example Update:

If weight = 0.5, gradient = -0.3, learning rate = 0.1:

New weight = 0.5 - 0.1 × (-0.3) = 0.53

Current Status:

Position: High error region

Gradient: Points toward minimum

Next step: Move down the slope

Complete Backpropagation Algorithm

The complete learning process in action

Algorithm Steps

1. Initialize

Set random weights between -0.5 and 0.5

2. Forward Pass

Calculate all neuron outputs from input to output layer

3. Calculate Error

Compare prediction with expected output

4. Backward Pass

Calculate gradients for all weights using chain rule

5. Update Weights

Adjust all weights using gradient descent

6. Repeat

Continue until error is minimized or max epochs reached

Live Training Demo

Training XOR Function

Epoch: 0

Error: 1.000

Current Training Sample:

[0, 0] → Expected: 0, Predicted: 0.5

Weight Updates This Step:

w₁: 0.23 → 0.31 (+0.08)
w₂: -0.15 → -0.09 (+0.06)
w₃: 0.44 → 0.39 (-0.05)

Training Progress

Training Multilayer Networks: The Complete Process

Training Phases

🔄 Epoch

One complete pass through all training data

Typically need 100-1000+ epochs

📊 Batch Processing

Process multiple samples before updating weights

Improves stability and efficiency

⚡ Learning Rate Schedule

Adjust learning rate during training

Start large, decrease over time

🛑 Stopping Criteria

When to stop training:

Error below threshold
Max epochs reached
No improvement

Training Progress Simulation

Training Loss

0.850

Accuracy

25%

Training Challenges & Solutions

⚠️ Common Problems

Vanishing Gradients: Gradients become too small in deep networks
Overfitting: Network memorizes training data
Local Minima: Getting stuck in suboptimal solutions
Slow Convergence: Taking too long to learn

Warning Signs:

Training accuracy 95%, Test accuracy 60% = Overfitting!

✅ Solutions

Better Activations: ReLU instead of sigmoid
Regularization: Prevent overfitting
Momentum: Help escape local minima
Adaptive Learning: Adjust rates automatically

Modern Techniques:

Dropout, Batch Normalization, Adam optimizer

💡 Best Practices

Start Simple: Few layers, then add complexity
Monitor Both: Training and validation loss
Early Stopping: Stop when validation loss increases
Cross-Validation: Test on unseen data

Golden Rule:

"Make it work, then make it better"

💼 Business Impact

Example: Netflix uses neural networks to recommend movies. Better training = better recommendations = happier customers = more revenue!

🧮 Exercise: Manual Backpropagation Calculation

Given Network

📋 Given Information

Activation: Sigmoid function

Learning Rate: η = 0.5

Expected Output: 1

Step-by-Step Calculation

Part A: Forward Pass

1. Calculate hidden layer input:

z_h = =

2. Apply sigmoid activation:

h = σ( ) =

3. Calculate output:

y = σ( ) =

Part B: Backward Pass

1. Calculate output error:

δ_out =

2. Calculate hidden error:

δ_h =

Part C: Weight Updates

1. Update w₃:

w₃_new =

2. Update w₁:

w₁_new =

3. Update w₂:

w₂_new =

MLPs: Solving Non-Linear Problems

Hidden layers enable neural networks to learn complex patterns

Linear Separable

Single Layer Sufficient

One line separates classes

Complex Pattern

Multiple Layers Needed

Circular decision boundary

Real-World Data

Deep Networks Required

Irregular, complex boundaries

🎯 Universal Approximation Theorem

A neural network with just one hidden layer can approximate any continuous function, given enough neurons!

Neural Network Terminology: Clearing the Confusion

Understanding the relationship between different terms

Term	Definition	Layers	Capability
Perceptron	Single artificial neuron	Input → Output	Linear classification only
MLP	Multiple layers of perceptrons	Input → Hidden(s) → Output	Non-linear problems
Neural Network	General term for all architectures	Various architectures	Depends on architecture

Feedforward Neural Networks

Architecture

Feedforward Properties

Characteristics

Unidirectional: Information flows forward only
No loops: No feedback connections
Layer-wise: Organized in distinct layers
Dense connections: Neurons connect to all neurons in next layer

Applications

Image classification
Regression problems
Pattern recognition
Feature extraction

Real Example

Credit Scoring: Input customer data → Hidden layers learn risk patterns → Output credit score

Convolutional Neural Networks (CNNs)

CNN Architecture

CNN Key Features

Convolutional Layers

Filters/Kernels: Detect specific patterns
Local connectivity: Each neuron sees small region
Parameter sharing: Same filter across entire image

Pooling Layers

Dimensionality reduction: Smaller feature maps
Translation invariance: Robust to small shifts
Max/Average pooling: Keep strongest features

Perfect for:

Image classification, facial recognition, medical imaging, autonomous vehicles

Recurrent Neural Networks (RNNs)

RNN Architecture

RNN Applications

Sequential Data Processing

Memory: Maintains information from previous inputs
Variable length: Handles sequences of any length
Temporal patterns: Learns patterns over time

Real-World Uses

Language Translation: "Hello" → "Hola"
Speech Recognition: Audio → Text
Stock Prediction: Price history → Future price
Sentiment Analysis: Text → Emotion

Example: Text Analysis

Input: "The movie was really..."

RNN predicts: "good" (based on context)

Generative Adversarial Networks (GANs)

GAN Architecture

How GANs Work

Generator Network

Creates fake data from random noise

Goal: Fool the discriminator

Discriminator Network

Distinguishes real from fake data

Goal: Catch the generator's fakes

Adversarial Training

Two networks compete and improve together

Applications

Generating realistic images
Style transfer
Data augmentation
Creating deepfakes

Transformer Networks

Transformer Architecture

Transformer Innovation

Self-Attention Mechanism

Parallel processing: All positions simultaneously
Long-range dependencies: Connect distant words
No sequential bottleneck: Unlike RNNs

Revolutionary Applications

ChatGPT: Conversational AI
Google Translate: Language translation
DALL-E: Text-to-image generation
Code generation: GitHub Copilot

Example: Attention in Action

"The animal didn't cross the street because it was too tired"

Attention helps determine that "it" refers to "animal"

Other Specialized Neural Networks

Radial Basis Function Networks

Uses: Function approximation, time series prediction

Radial basis functions as activation functions

Self-Organizing Maps

Uses: Clustering, data visualization

Maps high-dimensional data to 2D grid

Modular Neural Networks

Uses: Complex tasks with subtasks

Specialized modules for different parts

Autoencoders

Uses: Dimensionality reduction, denoising

Learns compressed representations

Choosing the Right Neural Network Architecture

Different problems require different network types

Data Type	Problem	Best Architecture	Example Application
Images	Object detection	CNN	Medical X-ray diagnosis
Text/Sequences	Language processing	Transformer/RNN	Language translation
Tabular Data	Prediction/Classification	Feedforward/MLP	Customer churn prediction
Time Series	Forecasting	RNN/LSTM	Stock price prediction
Generation Tasks	Create new data	GAN/VAE	Art generation
Dimensionality Reduction	Data compression	Autoencoder	Data visualization

Decision Framework

Ask yourself: What type of data do I have? What am I trying to predict or generate? How complex are the patterns?

Neural Network Performance Comparison

Real-world performance data across different tasks

Image Classification (ImageNet)

Method	Year	Accuracy
Traditional ML	2010	26%
AlexNet (CNN)	2012	63%
ResNet (Deep CNN)	2015	77%
EfficientNet	2019	84%

Key Performance Insights

Computational Requirements

Perceptron: Minutes on laptop
MLP: Hours on laptop
CNN: Hours to days on GPU
Large Transformers: Weeks on GPU clusters

Data Requirements

Simple problems: 1,000 samples
Image classification: 10,000+ samples
Language models: Millions of samples
Large language models: Billions of samples

Accuracy Trends

More data: Generally better performance
Deeper networks: Can learn more complex patterns
Specialized architectures: Optimized for specific tasks

Neural Networks in Industry

Healthcare

Medical Imaging: Detecting cancer in X-rays
Drug Discovery: Predicting molecular properties
Diagnosis: Symptom analysis and recommendations
Genomics: DNA sequence analysis

Impact: Google's AI can detect diabetic retinopathy with 90% accuracy

Finance

Fraud Detection: Identifying suspicious transactions
Credit Scoring: Assessing loan default risk
Algorithmic Trading: Automated investment decisions
Risk Management: Portfolio optimization

Impact: JPMorgan's AI processes $6 trillion in transactions daily

Technology

Recommendation Systems: Netflix, Amazon, Spotify
Search Engines: Google's ranking algorithms
Voice Assistants: Siri, Alexa speech recognition
Autonomous Vehicles: Tesla's autopilot

Impact: Netflix estimates its recommendation system saves $1B annually

Market Size

Global AI Market: $150B in 2023 → Expected $1.3T by 2030

Future of Neural Networks

Emerging Trends

Neuromorphic Computing

Hardware that mimics brain structure

1000x more energy efficient

Quantum Neural Networks

Quantum computing meets neural networks

Exponential speedup potential

Few-Shot Learning

Learning from very few examples

Like human learning

Explainable AI

Understanding why networks make decisions

Critical for healthcare, finance

Multimodal Models

Processing text, images, audio together

Like GPT-4 with vision

Challenges Ahead

Technical Challenges

Energy consumption of large models
Bias and fairness in AI systems
Robustness and security
Catastrophic forgetting

Ethical Considerations

Job displacement concerns
Privacy and surveillance
Deepfakes and misinformation
AI safety and control

Research Directions

More efficient training methods
Better interpretability tools
Continual learning systems
Human-AI collaboration

Getting Started: Practical Implementation

Tools and Frameworks

Python Libraries

TensorFlow: Google's comprehensive framework
PyTorch: Facebook's research-friendly library
Keras: High-level, user-friendly API
Scikit-learn: Traditional ML algorithms

Cloud Platforms

Google Colab: Free GPU access
AWS SageMaker: Production ML platform
Azure ML: Microsoft's ML service
Kaggle: Competitions and datasets

Getting Started

Start with simple problems
Use pre-built datasets
Follow online tutorials
Join ML communities

Development Workflow

1

Data Collection & Preparation

Clean, preprocess, and split your data

2

Choose Architecture

Select appropriate network type

3

Start Simple

Begin with basic model, add complexity gradually

4

Train & Validate

Monitor both training and validation metrics

5

Hyperparameter Tuning

Optimize learning rate, architecture, etc.

6

Deploy & Monitor

Put model into production and track performance

Common Mistakes & Best Practices

Common Mistakes

1. Using Too Complex Models

Starting with 100-layer networks for simple problems

Start simple, add complexity as needed

2. Insufficient Data

Trying to train deep networks on tiny datasets

Use simpler models or data augmentation

3. Ignoring Validation

Only looking at training accuracy

Always monitor test/validation performance

4. Poor Preprocessing

Not normalizing inputs or handling missing data

Preprocessing is crucial for good performance

5. Wrong Learning Rate

Learning rate too high (exploding) or too low (slow)

Start with 0.001, adjust based on loss curves

Best Practices

1. Start with Baselines

Implement simple models first (logistic regression, SVM)

Beat the baseline before going complex

2. Understand Your Data

Explore distributions, correlations, missing values

Good data beats fancy algorithms

3. Monitor Everything

Track loss, accuracy, gradients, weights

Use tools like TensorBoard

4. Use Cross-Validation

Multiple train/test splits for robust evaluation

K-fold CV gives better estimates

5. Document Everything

Keep track of experiments, hyperparameters, results

Reproducibility is key in ML

Next Week: Deep Dive into CNNs

Week 8: Image Classification with Deep Convolutional Neural Networks

What We'll Build

Image Classifier Project

Dataset: 10,000 cat and dog images

Goal: 85%+ accuracy

Tools: Python, TensorFlow, Keras

Learning Objectives

Technical Skills

Building CNN architectures
Data preprocessing for images
Training deep networks
Evaluating model performance

Practical Experience

Handling real-world datasets
Debugging training issues
Hyperparameter optimization
Model deployment basics

Preparation for Today

Review matrix operations and Python basics

Install TensorFlow/Keras if working locally

Summary: Neural Networks Fundamentals

What we learned today

Core Concepts

Artificial neurons mimic biology
Weights and biases are learnable parameters
Activation functions enable non-linearity
Training finds optimal parameters

Key Algorithms

Perceptron: Single neuron classifier
MLP: Multiple layers for complex patterns
Backpropagation: The learning algorithm
Gradient descent: Optimization method

Network Types

Feedforward: Basic architecture
CNN: For image processing
RNN: For sequential data
Transformers: For language tasks

Remember: Neural networks are powerful tools, but they need:

Good data + Appropriate architecture + Proper training = Success

Questions & Discussion

Discussion Topics

Which neural network type would you choose for your business problem?
What challenges do you anticipate in implementing neural networks?
How might neural networks impact your industry?
What ethical considerations should we keep in mind?

Next Week: Hands-on CNN implementation for image classification