DATA 4800

Introduction to Neural Networks

Workshop 7

Artificial Intelligence and Machine Learning

Kaplan Business School

Today's Learning Agenda

๐Ÿง  From Machine Learning to Deep Learning

Understanding when and why we need neural networks

โšก Artificial Neurons

Building blocks inspired by biology

๐Ÿ”— Neural Network Components

Weights, biases, and activation functions

๐ŸŽฏ Perceptron Training

Learning from examples step by step

๐ŸŒ Multilayer Networks

Overcoming limitations with hidden layers

๐Ÿ”„ Backpropagation

The learning algorithm that powers neural networks

๐Ÿš€ Network Architectures

Different types for different problems

When Do We Need Deep Learning?

Traditional Machine Learning

Works well for:

  • Structured data (tables, spreadsheets)
  • Linear relationships
  • Small datasets (< 10,000 rows)
  • Clear feature engineering
Example: Predicting house prices
Features: bedrooms, size, location โ†’ Price

Deep Learning Required

Essential for:

  • Unstructured data (images, text, audio)
  • Complex patterns
  • Large datasets (> 100,000 samples)
  • Automatic feature discovery
Example: Image recognition
Raw pixels โ†’ Complex patterns โ†’ Object classification

Data-Driven Evidence

ImageNet Challenge: Traditional ML accuracy ~26% โ†’ Deep Learning accuracy >95%

The Foundation: Artificial Neurons

Deep Learning = Networks of Artificial Neurons

Neuron Networks of

Deep Learning = Many neurons working together to learn complex patterns

Artificial Neurons & Biological Inspiration

Biological Neuron

Cell Body Dendrites (inputs) Axon (output)
  • Dendrites: Receive signals
  • Cell Body: Processes information
  • Axon: Sends output signal
  • Synapses: Connection strength

Artificial Neuron

xโ‚ wโ‚ xโ‚‚ wโ‚‚ xโ‚ƒ wโ‚ƒ ฮฃ f(x) y Inputs Weights Activation Output
  • Inputs (x): Like dendrites
  • Weights (w): Like synaptic strength
  • Activation function: Like cell body processing
  • Output (y): Like axon signal

Neural Network Components: Weights & Inputs

Xโ‚ Age: 25 Xโ‚‚ Income: 50K Xโ‚ƒ Purchases: 8 wโ‚=0.6 wโ‚‚=0.8 wโ‚ƒ=0.4 ฮฃ + bias b = 0.2 Y

Key Components

Inputs (xแตข)

Customer data fed into the neuron

Example: Age, Income, Purchase history

Weights (wแตข)

Importance of each input connection

Higher weight = more influence on output

Bias (b)

Adjusts the neuron's sensitivity

Shifts the decision boundary

๐ŸŽฏ Training Goal: Find the best weights and biases!

Activation Functions: The Decision Makers

Activation functions determine when a neuron should "fire" or activate

Step Function

Output: 0 or 1

Used in perceptrons

ReLU

f(x) = max(0, x)

Most popular today

Sigmoid

f(x) = 1/(1+eโปหฃ)

Output: 0 to 1

Data-Driven Example: Customer Purchase Prediction

Input: Customer spends $500 โ†’ Weighted sum: 2.1 โ†’ ReLU(2.1) = 2.1 โ†’ Likely to purchase!

Input: Customer spends $50 โ†’ Weighted sum: -0.3 โ†’ ReLU(-0.3) = 0 โ†’ Unlikely to purchase

๐Ÿงฎ Exercise: Calculate Neuron Output

Given Network

3 Xโ‚ = 3 -1 Xโ‚‚ = -1 wโ‚ = 0.4 wโ‚‚ = -0.7 ฮฃ + b f(x) b = 0.5 ?

Calculate for Both Activation Functions

Part A: Using ReLU Function

ReLU(x) = max(0, x)

1. Calculate weighted sum:

z = + 0.5 =

2. Apply ReLU activation:

Output = ReLU( ) =

Part B: Using Sigmoid Function

Sigmoid(x) = 1/(1 + eโปหฃ)

1. Use the same weighted sum z =

2. Apply Sigmoid activation:

Output = 1/(1 + eโป ) =

Perceptron Limitation: Linear Separability

Single-layer perceptrons can only solve linearly separable problems

โœ… Linearly Separable (AND Gate)

(0,0)โ†’0 (0,1)โ†’0 (1,0)โ†’0 (1,1)โ†’1 Decision Boundary

Perceptron can solve this!

โŒ NOT Linearly Separable (XOR Gate)

(0,0)โ†’0 (0,1)โ†’1 (1,0)โ†’1 (1,1)โ†’0 No single line works!

Perceptron cannot solve this!

๐Ÿšซ The XOR Problem

This limitation led to the "AI Winter" in the 1970s. Solution: Multilayer Neural Networks!

Multilayer Perceptron (MLP): Breaking the Limitation

Network Architecture

Xโ‚ Xโ‚‚ Input Layer Hโ‚ Hโ‚‚ Hโ‚ƒ Hidden Layer Y Output Layer

Key MLP Properties

1

Input Layer โ†’ Hidden Layer(s)

Each input connects to all hidden neurons

2

Hidden Layer(s) โ†’ Output Layer

Hidden neurons connect to all outputs

3

Feed-Forward Structure

Information flows in one direction only

4

Fully Connected

Every neuron connects to every neuron in next layer

MLP in Action: XOR Problem Solved

Watch how hidden layers enable complex pattern recognition

0 Xโ‚ 0 Xโ‚‚ 0 Hโ‚ 0 Hโ‚‚ 0 Y wโ‚ wโ‚‚ wโ‚ƒ wโ‚„ wโ‚… wโ‚†

XOR Truth Table

Xโ‚ Xโ‚‚ Y
0 0 0
0 1 1
1 0 1
1 1 0

๐Ÿ’ก Key Insight

Hidden layers create feature representations that make non-linear problems linearly separable!

Current Prediction:

Input: [0, 0]
Output: 0

Backpropagation: How Neural Networks Learn

The Learning Process

2 Xโ‚=2 3 Xโ‚‚=3 ? ? ? Expected: 1 Error: ?

The Learning Challenge

The Problem:

How do we train networks with hidden layers? We can't directly see what hidden neurons should output!

The Solution:

Backpropagation - propagate errors backward through the network to update all weights.

Two-Phase Process:

1. Forward Pass: Calculate predictions

2. Backward Pass: Calculate and propagate errors

Status: Ready to start

Current Error: -

Step 1: Forward Propagation

Computing Outputs Layer by Layer

Click "Start Forward Pass" to see the calculation

Forward Pass Steps

1

Input Layer

Feed raw data into network

2

Calculate Hidden Layer

z = ฮฃ(wแตข ร— xแตข) + b

h = activation(z)

3

Calculate Output Layer

Use hidden layer outputs as inputs

4

Get Final Prediction

Compare with expected output

Current Calculations:

Click "Start Forward Pass" to see calculations

Step 2: Calculate Error

Measuring how wrong our prediction is

Error Metrics

Mean Squared Error

E = ยฝ(Y - ลถ)ยฒ

Where Y = expected, ลถ = predicted

Why Square the Error?

  • Always positive
  • Penalizes large errors more
  • Mathematically convenient

Interactive Error Calculation

Customer Purchase Prediction Example

Expected Output
1

Will buy

Network Prediction
0.3

30% confidence

Error Calculation:

E = ยฝ(1 - 0.3)ยฒ = ยฝ(0.7)ยฒ = 0.245

Goal: Adjust weights to minimize this error!

Step 3: Backward Propagation

Error Flows Backward

Xโ‚ Xโ‚‚ Hโ‚ ฮดโ‚=? Hโ‚‚ ฮดโ‚‚=? Y ฮดโ‚’=-0.7

Understanding Gradients

What are Gradients?

Gradients tell us:

  • Direction: Which way to change weights
  • Magnitude: How much to change them

Chain Rule in Action

โˆ‚E/โˆ‚w = โˆ‚E/โˆ‚y ร— โˆ‚y/โˆ‚z ร— โˆ‚z/โˆ‚w

How changing weight w affects final error E

Error Attribution

Each weight gets "blame" proportional to its contribution to the error

Gradients will appear here during animation

Step 4: Gradient Descent - The Optimization Method

Gradient Descent Visualization

Error Surface Current Weight Gradient Direction Global Minimum Best Weights Weight Value Error

Weight Update Rule

wnew = wold - ฮท ร— โˆ‚E/โˆ‚w

Learning Rate (ฮท)

Controls how big steps we take:

  • Too large: Might overshoot minimum
  • Too small: Learning takes forever
  • Just right: Steady convergence

Example Update:

If weight = 0.5, gradient = -0.3, learning rate = 0.1:

New weight = 0.5 - 0.1 ร— (-0.3) = 0.53

Current Status:

Position: High error region

Gradient: Points toward minimum

Next step: Move down the slope

Complete Backpropagation Algorithm

The complete learning process in action

Algorithm Steps

1. Initialize

Set random weights between -0.5 and 0.5

2. Forward Pass

Calculate all neuron outputs from input to output layer

3. Calculate Error

Compare prediction with expected output

4. Backward Pass

Calculate gradients for all weights using chain rule

5. Update Weights

Adjust all weights using gradient descent

6. Repeat

Continue until error is minimized or max epochs reached

Live Training Demo

Training XOR Function

Epoch: 0

Error: 1.000

Current Training Sample:

[0, 0] โ†’ Expected: 0, Predicted: 0.5

Weight Updates This Step:

wโ‚: 0.23 โ†’ 0.31 (+0.08)

wโ‚‚: -0.15 โ†’ -0.09 (+0.06)

wโ‚ƒ: 0.44 โ†’ 0.39 (-0.05)

Training Progress

Training Multilayer Networks: The Complete Process

Training Phases

๐Ÿ”„ Epoch

One complete pass through all training data

Typically need 100-1000+ epochs

๐Ÿ“Š Batch Processing

Process multiple samples before updating weights

Improves stability and efficiency

โšก Learning Rate Schedule

Adjust learning rate during training

Start large, decrease over time

๐Ÿ›‘ Stopping Criteria

When to stop training:

  • Error below threshold
  • Max epochs reached
  • No improvement

Training Progress Simulation

Training Loss
0.850
Accuracy
25%

Training Challenges & Solutions

โš ๏ธ Common Problems

  • Vanishing Gradients: Gradients become too small in deep networks
  • Overfitting: Network memorizes training data
  • Local Minima: Getting stuck in suboptimal solutions
  • Slow Convergence: Taking too long to learn

Warning Signs:

Training accuracy 95%, Test accuracy 60% = Overfitting!

โœ… Solutions

  • Better Activations: ReLU instead of sigmoid
  • Regularization: Prevent overfitting
  • Momentum: Help escape local minima
  • Adaptive Learning: Adjust rates automatically

Modern Techniques:

Dropout, Batch Normalization, Adam optimizer

๐Ÿ’ก Best Practices

  • Start Simple: Few layers, then add complexity
  • Monitor Both: Training and validation loss
  • Early Stopping: Stop when validation loss increases
  • Cross-Validation: Test on unseen data

Golden Rule:

"Make it work, then make it better"

๐Ÿ’ผ Business Impact

Example: Netflix uses neural networks to recommend movies. Better training = better recommendations = happier customers = more revenue!

๐Ÿงฎ Exercise: Manual Backpropagation Calculation

Given Network

2 Xโ‚=2 1 Xโ‚‚=1 H b=0.5 Y b=0.2 wโ‚=0.6 wโ‚‚=0.3 wโ‚ƒ=0.8 Expected: 1

๐Ÿ“‹ Given Information

Activation: Sigmoid function

Learning Rate: ฮท = 0.5

Expected Output: 1

Step-by-Step Calculation

Part A: Forward Pass

1. Calculate hidden layer input:

z_h = =

2. Apply sigmoid activation:

h = ฯƒ( ) =

3. Calculate output:

y = ฯƒ( ) =

Part B: Backward Pass

1. Calculate output error:

ฮด_out =

2. Calculate hidden error:

ฮด_h =

Part C: Weight Updates

1. Update wโ‚ƒ:

wโ‚ƒ_new =

2. Update wโ‚:

wโ‚_new =

3. Update wโ‚‚:

wโ‚‚_new =

MLPs: Solving Non-Linear Problems

Hidden layers enable neural networks to learn complex patterns

Linear Separable

Single Layer Sufficient

One line separates classes

Complex Pattern

Multiple Layers Needed

Circular decision boundary

Real-World Data

Deep Networks Required

Irregular, complex boundaries

๐ŸŽฏ Universal Approximation Theorem

A neural network with just one hidden layer can approximate any continuous function, given enough neurons!

Neural Network Terminology: Clearing the Confusion

Understanding the relationship between different terms

Neural Network (Umbrella Term) Multilayer Perceptron (MLP) Perceptron (Single Layer) CNN (Convolutional) RNN (Recurrent)
Term Definition Layers Capability
Perceptron Single artificial neuron Input โ†’ Output Linear classification only
MLP Multiple layers of perceptrons Input โ†’ Hidden(s) โ†’ Output Non-linear problems
Neural Network General term for all architectures Various architectures Depends on architecture

Feedforward Neural Networks

Architecture

xโ‚ xโ‚‚ xโ‚ƒ y Input Hidden 1 Hidden 2 Output

Feedforward Properties

Characteristics

  • Unidirectional: Information flows forward only
  • No loops: No feedback connections
  • Layer-wise: Organized in distinct layers
  • Dense connections: Neurons connect to all neurons in next layer

Applications

  • Image classification
  • Regression problems
  • Pattern recognition
  • Feature extraction

Real Example

Credit Scoring: Input customer data โ†’ Hidden layers learn risk patterns โ†’ Output credit score

Convolutional Neural Networks (CNNs)

CNN Architecture

Image 32ร—32ร—3 Conv Layer Feature Maps Pool Downsample Conv FC Layer Cat

CNN Key Features

Convolutional Layers

  • Filters/Kernels: Detect specific patterns
  • Local connectivity: Each neuron sees small region
  • Parameter sharing: Same filter across entire image

Pooling Layers

  • Dimensionality reduction: Smaller feature maps
  • Translation invariance: Robust to small shifts
  • Max/Average pooling: Keep strongest features

Perfect for:

Image classification, facial recognition, medical imaging, autonomous vehicles

Recurrent Neural Networks (RNNs)

RNN Architecture

t-1 t t+1 RNN RNN RNN xโ‚ xโ‚‚ xโ‚ƒ yโ‚ yโ‚‚ yโ‚ƒ hidden state

RNN Applications

Sequential Data Processing

  • Memory: Maintains information from previous inputs
  • Variable length: Handles sequences of any length
  • Temporal patterns: Learns patterns over time

Real-World Uses

  • Language Translation: "Hello" โ†’ "Hola"
  • Speech Recognition: Audio โ†’ Text
  • Stock Prediction: Price history โ†’ Future price
  • Sentiment Analysis: Text โ†’ Emotion

Example: Text Analysis

Input: "The movie was really..."

RNN predicts: "good" (based on context)

Generative Adversarial Networks (GANs)

GAN Architecture

Noise Random Input Generator G Fake Generated Real Training Data Discriminator D Real Fake Adversarial Loss

How GANs Work

Generator Network

Creates fake data from random noise

Goal: Fool the discriminator

Discriminator Network

Distinguishes real from fake data

Goal: Catch the generator's fakes

Adversarial Training

Two networks compete and improve together

Applications

  • Generating realistic images
  • Style transfer
  • Data augmentation
  • Creating deepfakes

Transformer Networks

Transformer Architecture

I am a student Input Sequence Self-Attention Layer Feed Forward Output

Transformer Innovation

Self-Attention Mechanism

  • Parallel processing: All positions simultaneously
  • Long-range dependencies: Connect distant words
  • No sequential bottleneck: Unlike RNNs

Revolutionary Applications

  • ChatGPT: Conversational AI
  • Google Translate: Language translation
  • DALL-E: Text-to-image generation
  • Code generation: GitHub Copilot

Example: Attention in Action

"The animal didn't cross the street because it was too tired"

Attention helps determine that "it" refers to "animal"

Other Specialized Neural Networks

Radial Basis Function Networks

Uses: Function approximation, time series prediction

Radial basis functions as activation functions

Self-Organizing Maps

Uses: Clustering, data visualization

Maps high-dimensional data to 2D grid

Modular Neural Networks

Mod 1 Mod 2 Coordinator Out

Uses: Complex tasks with subtasks

Specialized modules for different parts

Autoencoders

Encoder Bottleneck Decoder

Uses: Dimensionality reduction, denoising

Learns compressed representations

Choosing the Right Neural Network Architecture

Different problems require different network types

Data Type Problem Best Architecture Example Application
Images Object detection CNN Medical X-ray diagnosis
Text/Sequences Language processing Transformer/RNN Language translation
Tabular Data Prediction/Classification Feedforward/MLP Customer churn prediction
Time Series Forecasting RNN/LSTM Stock price prediction
Generation Tasks Create new data GAN/VAE Art generation
Dimensionality Reduction Data compression Autoencoder Data visualization

Decision Framework

Ask yourself: What type of data do I have? What am I trying to predict or generate? How complex are the patterns?

Neural Network Performance Comparison

Real-world performance data across different tasks

Image Classification (ImageNet)

Method Year Accuracy
Traditional ML 2010 26%
AlexNet (CNN) 2012 63%
ResNet (Deep CNN) 2015 77%
EfficientNet 2019 84%

Key Performance Insights

Computational Requirements

  • Perceptron: Minutes on laptop
  • MLP: Hours on laptop
  • CNN: Hours to days on GPU
  • Large Transformers: Weeks on GPU clusters

Data Requirements

  • Simple problems: 1,000 samples
  • Image classification: 10,000+ samples
  • Language models: Millions of samples
  • Large language models: Billions of samples

Accuracy Trends

  • More data: Generally better performance
  • Deeper networks: Can learn more complex patterns
  • Specialized architectures: Optimized for specific tasks

Neural Networks in Industry

Healthcare

  • Medical Imaging: Detecting cancer in X-rays
  • Drug Discovery: Predicting molecular properties
  • Diagnosis: Symptom analysis and recommendations
  • Genomics: DNA sequence analysis

Impact: Google's AI can detect diabetic retinopathy with 90% accuracy

Finance

  • Fraud Detection: Identifying suspicious transactions
  • Credit Scoring: Assessing loan default risk
  • Algorithmic Trading: Automated investment decisions
  • Risk Management: Portfolio optimization

Impact: JPMorgan's AI processes $6 trillion in transactions daily

Technology

  • Recommendation Systems: Netflix, Amazon, Spotify
  • Search Engines: Google's ranking algorithms
  • Voice Assistants: Siri, Alexa speech recognition
  • Autonomous Vehicles: Tesla's autopilot

Impact: Netflix estimates its recommendation system saves $1B annually

Market Size

Global AI Market: $150B in 2023 โ†’ Expected $1.3T by 2030

Future of Neural Networks

Emerging Trends

Neuromorphic Computing

Hardware that mimics brain structure

1000x more energy efficient

Quantum Neural Networks

Quantum computing meets neural networks

Exponential speedup potential

Few-Shot Learning

Learning from very few examples

Like human learning

Explainable AI

Understanding why networks make decisions

Critical for healthcare, finance

Multimodal Models

Processing text, images, audio together

Like GPT-4 with vision

Challenges Ahead

Technical Challenges

  • Energy consumption of large models
  • Bias and fairness in AI systems
  • Robustness and security
  • Catastrophic forgetting

Ethical Considerations

  • Job displacement concerns
  • Privacy and surveillance
  • Deepfakes and misinformation
  • AI safety and control

Research Directions

  • More efficient training methods
  • Better interpretability tools
  • Continual learning systems
  • Human-AI collaboration

Getting Started: Practical Implementation

Tools and Frameworks

Python Libraries

  • TensorFlow: Google's comprehensive framework
  • PyTorch: Facebook's research-friendly library
  • Keras: High-level, user-friendly API
  • Scikit-learn: Traditional ML algorithms

Cloud Platforms

  • Google Colab: Free GPU access
  • AWS SageMaker: Production ML platform
  • Azure ML: Microsoft's ML service
  • Kaggle: Competitions and datasets

Getting Started

  • Start with simple problems
  • Use pre-built datasets
  • Follow online tutorials
  • Join ML communities

Development Workflow

1

Data Collection & Preparation

Clean, preprocess, and split your data

2

Choose Architecture

Select appropriate network type

3

Start Simple

Begin with basic model, add complexity gradually

4

Train & Validate

Monitor both training and validation metrics

5

Hyperparameter Tuning

Optimize learning rate, architecture, etc.

6

Deploy & Monitor

Put model into production and track performance

Common Mistakes & Best Practices

Common Mistakes

1. Using Too Complex Models

Starting with 100-layer networks for simple problems

Start simple, add complexity as needed

2. Insufficient Data

Trying to train deep networks on tiny datasets

Use simpler models or data augmentation

3. Ignoring Validation

Only looking at training accuracy

Always monitor test/validation performance

4. Poor Preprocessing

Not normalizing inputs or handling missing data

Preprocessing is crucial for good performance

5. Wrong Learning Rate

Learning rate too high (exploding) or too low (slow)

Start with 0.001, adjust based on loss curves

Best Practices

1. Start with Baselines

Implement simple models first (logistic regression, SVM)

Beat the baseline before going complex

2. Understand Your Data

Explore distributions, correlations, missing values

Good data beats fancy algorithms

3. Monitor Everything

Track loss, accuracy, gradients, weights

Use tools like TensorBoard

4. Use Cross-Validation

Multiple train/test splits for robust evaluation

K-fold CV gives better estimates

5. Document Everything

Keep track of experiments, hyperparameters, results

Reproducibility is key in ML

Next Week: Deep Dive into CNNs

Week 8: Image Classification with Deep Convolutional Neural Networks

What We'll Build

Image Classifier Project

Cat Dog CNN Prediction

Dataset: 10,000 cat and dog images

Goal: 85%+ accuracy

Tools: Python, TensorFlow, Keras

Learning Objectives

Technical Skills

  • Building CNN architectures
  • Data preprocessing for images
  • Training deep networks
  • Evaluating model performance

Practical Experience

  • Handling real-world datasets
  • Debugging training issues
  • Hyperparameter optimization
  • Model deployment basics

Preparation for Today

Review matrix operations and Python basics

Install TensorFlow/Keras if working locally

Summary: Neural Networks Fundamentals

What we learned today

Core Concepts

  • Artificial neurons mimic biology
  • Weights and biases are learnable parameters
  • Activation functions enable non-linearity
  • Training finds optimal parameters

Key Algorithms

  • Perceptron: Single neuron classifier
  • MLP: Multiple layers for complex patterns
  • Backpropagation: The learning algorithm
  • Gradient descent: Optimization method

Network Types

  • Feedforward: Basic architecture
  • CNN: For image processing
  • RNN: For sequential data
  • Transformers: For language tasks

Remember: Neural networks are powerful tools, but they need:

Good data + Appropriate architecture + Proper training = Success

Questions & Discussion

?

Discussion Topics

  • Which neural network type would you choose for your business problem?
  • What challenges do you anticipate in implementing neural networks?
  • How might neural networks impact your industry?
  • What ethical considerations should we keep in mind?

Next Week: Hands-on CNN implementation for image classification