Artificial Intelligence and Machine Learning
Kaplan Business School
Understanding when and why we need neural networks
Building blocks inspired by biology
Weights, biases, and activation functions
Learning from examples step by step
Overcoming limitations with hidden layers
The learning algorithm that powers neural networks
Different types for different problems
ImageNet Challenge: Traditional ML accuracy ~26% โ Deep Learning accuracy >95%
Deep Learning = Networks of Artificial Neurons
Deep Learning = Many neurons working together to learn complex patterns
Customer data fed into the neuron
Example: Age, Income, Purchase historyImportance of each input connection
Higher weight = more influence on outputAdjusts the neuron's sensitivity
Shifts the decision boundary๐ฏ Training Goal: Find the best weights and biases!
Activation functions determine when a neuron should "fire" or activate
Output: 0 or 1
Used in perceptrons
f(x) = max(0, x)
Most popular today
f(x) = 1/(1+eโปหฃ)
Output: 0 to 1
Input: Customer spends $500 โ Weighted sum: 2.1 โ ReLU(2.1) = 2.1 โ Likely to purchase!
Input: Customer spends $50 โ Weighted sum: -0.3 โ ReLU(-0.3) = 0 โ Unlikely to purchase
ReLU(x) = max(0, x)
1. Calculate weighted sum:
2. Apply ReLU activation:
Sigmoid(x) = 1/(1 + eโปหฃ)
1. Use the same weighted sum z =
2. Apply Sigmoid activation:
Single-layer perceptrons can only solve linearly separable problems
Perceptron can solve this!
Perceptron cannot solve this!
This limitation led to the "AI Winter" in the 1970s. Solution: Multilayer Neural Networks!
Each input connects to all hidden neurons
Hidden neurons connect to all outputs
Information flows in one direction only
Every neuron connects to every neuron in next layer
Watch how hidden layers enable complex pattern recognition
| Xโ | Xโ | Y |
|---|---|---|
| 0 | 0 | 0 |
| 0 | 1 | 1 |
| 1 | 0 | 1 |
| 1 | 1 | 0 |
Hidden layers create feature representations that make non-linear problems linearly separable!
How do we train networks with hidden layers? We can't directly see what hidden neurons should output!
Backpropagation - propagate errors backward through the network to update all weights.
1. Forward Pass: Calculate predictions
2. Backward Pass: Calculate and propagate errors
Status: Ready to start
Current Error: -
Click "Start Forward Pass" to see the calculation
Feed raw data into network
z = ฮฃ(wแตข ร xแตข) + b
h = activation(z)
Use hidden layer outputs as inputs
Compare with expected output
Click "Start Forward Pass" to see calculations
Measuring how wrong our prediction is
Where Y = expected, ลถ = predicted
Will buy
30% confidence
Goal: Adjust weights to minimize this error!
Gradients tell us:
โE/โw = โE/โy ร โy/โz ร โz/โw
How changing weight w affects final error E
Each weight gets "blame" proportional to its contribution to the error
Gradients will appear here during animation
Controls how big steps we take:
If weight = 0.5, gradient = -0.3, learning rate = 0.1:
New weight = 0.5 - 0.1 ร (-0.3) = 0.53
Position: High error region
Gradient: Points toward minimum
Next step: Move down the slope
The complete learning process in action
Set random weights between -0.5 and 0.5
Calculate all neuron outputs from input to output layer
Compare prediction with expected output
Calculate gradients for all weights using chain rule
Adjust all weights using gradient descent
Continue until error is minimized or max epochs reached
Epoch: 0
Error: 1.000
[0, 0] โ Expected: 0, Predicted: 0.5
wโ: 0.23 โ 0.31 (+0.08)
wโ: -0.15 โ -0.09 (+0.06)
wโ: 0.44 โ 0.39 (-0.05)
Training Progress
One complete pass through all training data
Typically need 100-1000+ epochsProcess multiple samples before updating weights
Improves stability and efficiencyAdjust learning rate during training
Start large, decrease over timeWhen to stop training:
Training accuracy 95%, Test accuracy 60% = Overfitting!
Dropout, Batch Normalization, Adam optimizer
"Make it work, then make it better"
Example: Netflix uses neural networks to recommend movies. Better training = better recommendations = happier customers = more revenue!
Activation: Sigmoid function
Learning Rate: ฮท = 0.5
Expected Output: 1
1. Calculate hidden layer input:
2. Apply sigmoid activation:
3. Calculate output:
1. Calculate output error:
2. Calculate hidden error:
1. Update wโ:
2. Update wโ:
3. Update wโ:
Hidden layers enable neural networks to learn complex patterns
Single Layer Sufficient
One line separates classes
Multiple Layers Needed
Circular decision boundary
Deep Networks Required
Irregular, complex boundaries
A neural network with just one hidden layer can approximate any continuous function, given enough neurons!
Understanding the relationship between different terms
| Term | Definition | Layers | Capability |
|---|---|---|---|
| Perceptron | Single artificial neuron | Input โ Output | Linear classification only |
| MLP | Multiple layers of perceptrons | Input โ Hidden(s) โ Output | Non-linear problems |
| Neural Network | General term for all architectures | Various architectures | Depends on architecture |
Credit Scoring: Input customer data โ Hidden layers learn risk patterns โ Output credit score
Image classification, facial recognition, medical imaging, autonomous vehicles
Input: "The movie was really..."
RNN predicts: "good" (based on context)
Creates fake data from random noise
Goal: Fool the discriminator
Distinguishes real from fake data
Goal: Catch the generator's fakes
Two networks compete and improve together
"The animal didn't cross the street because it was too tired"
Attention helps determine that "it" refers to "animal"
Uses: Function approximation, time series prediction
Radial basis functions as activation functions
Uses: Clustering, data visualization
Maps high-dimensional data to 2D grid
Uses: Complex tasks with subtasks
Specialized modules for different parts
Uses: Dimensionality reduction, denoising
Learns compressed representations
Different problems require different network types
| Data Type | Problem | Best Architecture | Example Application |
|---|---|---|---|
| Images | Object detection | CNN | Medical X-ray diagnosis |
| Text/Sequences | Language processing | Transformer/RNN | Language translation |
| Tabular Data | Prediction/Classification | Feedforward/MLP | Customer churn prediction |
| Time Series | Forecasting | RNN/LSTM | Stock price prediction |
| Generation Tasks | Create new data | GAN/VAE | Art generation |
| Dimensionality Reduction | Data compression | Autoencoder | Data visualization |
Ask yourself: What type of data do I have? What am I trying to predict or generate? How complex are the patterns?
Real-world performance data across different tasks
| Method | Year | Accuracy |
|---|---|---|
| Traditional ML | 2010 | 26% |
| AlexNet (CNN) | 2012 | 63% |
| ResNet (Deep CNN) | 2015 | 77% |
| EfficientNet | 2019 | 84% |
Impact: Google's AI can detect diabetic retinopathy with 90% accuracy
Impact: JPMorgan's AI processes $6 trillion in transactions daily
Impact: Netflix estimates its recommendation system saves $1B annually
Global AI Market: $150B in 2023 โ Expected $1.3T by 2030
Hardware that mimics brain structure
1000x more energy efficient
Quantum computing meets neural networks
Exponential speedup potential
Learning from very few examples
Like human learning
Understanding why networks make decisions
Critical for healthcare, finance
Processing text, images, audio together
Like GPT-4 with vision
Clean, preprocess, and split your data
Select appropriate network type
Begin with basic model, add complexity gradually
Monitor both training and validation metrics
Optimize learning rate, architecture, etc.
Put model into production and track performance
Starting with 100-layer networks for simple problems
Start simple, add complexity as needed
Trying to train deep networks on tiny datasets
Use simpler models or data augmentation
Only looking at training accuracy
Always monitor test/validation performance
Not normalizing inputs or handling missing data
Preprocessing is crucial for good performance
Learning rate too high (exploding) or too low (slow)
Start with 0.001, adjust based on loss curves
Implement simple models first (logistic regression, SVM)
Beat the baseline before going complex
Explore distributions, correlations, missing values
Good data beats fancy algorithms
Track loss, accuracy, gradients, weights
Use tools like TensorBoard
Multiple train/test splits for robust evaluation
K-fold CV gives better estimates
Keep track of experiments, hyperparameters, results
Reproducibility is key in ML
Week 8: Image Classification with Deep Convolutional Neural Networks
Dataset: 10,000 cat and dog images
Goal: 85%+ accuracy
Tools: Python, TensorFlow, Keras
Review matrix operations and Python basics
Install TensorFlow/Keras if working locally
What we learned today
Good data + Appropriate architecture + Proper training = Success
Next Week: Hands-on CNN implementation for image classification