Decision Trees & Random Forests

Supervised Learning for Classification and Regression

Quick Poll: Have you made a decision tree today?

Yes, consciously

Yes, unconsciously

No

Not sure

Understand

How decision trees make predictions through hierarchical rule-based decisions

Build

Random forests to improve prediction accuracy and reduce overfitting

Apply

Tree-based methods to real-world classification and regression problems

Real-World Decision Making

How do you decide which movie to watch?

Netflix Recommendations

Personalized content filtering

Uses viewing history, ratings, and similar user behavior to predict preferences through sophisticated decision tree ensembles.

Loan Approvals

Financial risk assessment

Banks use income, credit score, employment history, and debt-to-income ratio to determine creditworthiness automatically.

Medical Diagnosis

Clinical decision support

Symptoms, test results, and patient history are processed through diagnostic trees to suggest potential conditions and treatments.

Key Insight: All these systems use the same fundamental approach - asking a series of structured questions to reach a decision, just like human reasoning but at scale.

The Classification Challenge

Can we teach a computer to classify customer responses?

Retail Campaign Dataset Sample

Age	Income (K)	Email Open Rate	Loyalty Member	Campaign Response
34	65.2	0.45	Yes	Responded
22	28.7	0.12	No	No Response
45	89.3	0.67	Yes	Responded

Which features do you think matter most for predicting campaign response?

Age

Income

Email Open Rate

Loyalty Membership

Today's Journey

From Simple Trees to Powerful Forests

Decision

Trees

Information

Theory

Random

Forests

Hands-on

Practice

Prediction: Which will be more accurate?

Single Decision Tree

Random Forest (Multiple Trees)

Timeline: 90 minutes to master tree-based machine learning

What is a Decision Tree?

A series of yes/no questions leading to an answer

Should I bring an umbrella?

Is it cloudy?

Yes → Bring it

No → Leave it

Check forecast?

Rain → Bring it

Sun → Leave it

Key Insight: Decision trees mirror human decision-making processes. We naturally think in terms of sequential questions and logical rules.

In machine learning, we formalize this intuitive process using mathematical criteria to determine the optimal sequence of questions.

Tree Anatomy

Understanding the structure

Income ≥ $50K?

Yes

Age ≥ 30?

Yes

Responded

No

No Response

No

Loyalty Member?

Yes

Responded

No

No Response

Tree Components

Root Node: First decision point (Income ≥ $50K?)
Internal Nodes: Intermediate decisions (Age ≥ 30?, Loyalty Member?)
Leaf Nodes: Final predictions (Responded, No Response)
Branches: Possible outcomes of each decision

Sample Data Leading to Tree

Income	Age	Loyalty	Response
$75K	35	Yes	Yes
$30K	25	Yes	Yes
$45K	22	No	No

Building Your First Tree (Demo)

Step-by-step tree construction

Bank Loan Dataset

Income	Credit Score	Age	Loan Approved
$80K	750	35	Yes
$35K	620	28	No
$65K	710	42	Yes
$25K	580	25	No
$90K	780	38	Yes
$40K	650	31	No

Step 1: Choose Best Split

Calculate information gain for each feature

Step 2: Split Dataset

Divide data based on chosen feature

Step 3: Repeat Process

Apply same logic to each subset

Step 4: Stop Condition

When nodes are pure or depth limit reached

Resulting Decision Tree:

Best first split: Credit Score ≥ 700

Left branch: Income ≥ $60K → Final decision

Right branch: All approved (pure node)

Information Gain Concept

How do we choose the best questions?

Before Split: Mixed Results

Customer Type	Count	Response Rate
All Customers	100	50% (Mixed)

High Uncertainty: Equal mix of responses

After Split: Reduced "Messiness"

Customer Type	Count	Response Rate
High Income	40	80% (Clear)
Low Income	60	30% (Clear)

Low Uncertainty: Clear patterns in each group

Information Gain = Entropy(Parent) - Weighted Average(Entropy(Children))

Goal: Maximize information gain by choosing splits that create the most homogeneous subgroups

Intuition: Good splits transform confusion into clarity. We want each resulting group to be as "pure" as possible in terms of the target outcome.

Entropy Explained Simply

Entropy = measure of disorder/uncertainty

High Entropy (Mixed)

Entropy ≈ 1.58

Highly unpredictable

Low Entropy (Sorted)

Entropy = 0

Completely predictable

Real Example: Customer Campaign Response

All Customers

50 Responded, 50 No Response

Entropy = 1.0

High Income Group

80 Responded, 20 No Response

Entropy = 0.72

Loyalty Members

95 Responded, 5 No Response

Entropy = 0.29

Entropy = -Σ p(i) × log₂(p(i))

where p(i) is the proportion of samples belonging to class i

Gini Index Alternative

Another way to measure purity

What is Purity in Decision Trees?

Purity means how "unmixed" or "homogeneous" a group is. Think of it like:

Pure milk: Only milk, no water mixed in
Pure customer group: All customers behave the same way (all respond or all don't respond)
Impure mixture: 50% responders, 50% non-responders = very mixed/impure
High purity: 95% responders, 5% non-responders = almost pure

Goal: Split data to create the purest possible groups - where customers in each group behave as similarly as possible.

Entropy

Range: 0 to log₂(classes)
Based on information theory
More computationally expensive
Emphasizes balanced splits
Formula: -Σ p(i) × log₂(p(i))

Gini Index

Range: 0 to 0.5 (binary classification)
Based on probability theory
Faster to compute
Emphasizes largest class
Formula: 1 - Σ p(i)²

Interactive Calculation: 2-Class Example

Dataset: 60 customers responded, 40 did not respond

Calculate Gini Index:

p(Responded) = 60/100 = 0.6

p(No Response) = 40/100 = 0.4

Gini = 1 - (0.6² + 0.4²) = 1 - (0.36 + 0.16) = 0.48

Complete Solution:

Gini Index = 1 - (0.6² + 0.4²) = 1 - (0.36 + 0.16) = 1 - 0.52 = 0.48

Interpretation: Gini = 0.48 indicates moderate impurity. Lower values (closer to 0) represent more homogeneous groups, while higher values indicate more mixed groups.

Hands-On Activity 1

Calculate the best split for campaign response prediction

Customer Dataset (12 records)

Age	Income	Email Rate	Response
25	35K	0.2	No
35	65K	0.6	Yes
45	85K	0.8	Yes
28	40K	0.3	No
52	95K	0.7	Yes
31	55K	0.4	Yes
42	75K	0.9	Yes
29	45K	0.1	No
38	70K	0.5	Yes
33	50K	0.3	No
47	90K	0.8	Yes
26	38K	0.2	No

Key Formulas

Entropy = -Σ p(i) × log₂(p(i))

Information Gain = Entropy(Parent) - Weighted Average(Entropy(Children))

where p(i) is the proportion of samples in class i

Your Task: Find Best Split

1. Current entropy of all data:

7 Yes, 5 No → Entropy =

2. Test split: Email Rate ≥ 0.5

Left branch (≥0.5): Yes, No

Right branch (<0.5): Yes, No

3. Calculate entropies for each branch:

Entropy(Left) =

Entropy(Right) =

4. Calculate weighted average entropy:

Weighted Avg = (/12) × + (/12) ×

=

5. Calculate information gain:

Information Gain = - =

Complete Solution:

1. Initial entropy: H(S) = -7/12×log₂(7/12) - 5/12×log₂(5/12) = 0.98

2. Email Rate ≥ 0.5: 6 Yes, 0 No | Email Rate < 0.5: 1 Yes, 5 No

3. Entropy(Left) = 0, Entropy(Right) = 0.65

4. Weighted Avg = (6/12)×0 + (6/12)×0.65 = 0.325

5. Information Gain = 0.98 - 0.325 = 0.655

Conclusion: This is an excellent split with high information gain!

Decision Tree Advantages & Limitations

Understanding when to use single trees

Advantages

Interpretability: Easy to understand and explain to stakeholders
No preprocessing: Handles both numerical and categorical data
Feature selection: Automatically identifies important variables
Non-linear relationships: Captures complex interactions
Fast training: Efficient algorithm for large datasets
Rule extraction: Can extract explicit if-then rules

Limitations

Overfitting: Can create overly complex trees
Instability: Small data changes cause large tree changes
Bias: Favors features with more levels
Linear relationships: Poor at capturing linear patterns
Greedy algorithm: May not find globally optimal tree
Predictive accuracy: Often outperformed by ensemble methods

Key Insight: Single decision trees are excellent for interpretation and quick insights, but their limitations motivate ensemble methods like Random Forests for improved predictive performance.

Best Use Cases

Exploratory analysis, rule generation, and situations requiring interpretability

Alternative Solutions

Random Forests for accuracy, pruning for overfitting, gradient boosting for performance

The Wisdom of Crowds

Why ask one expert when you can ask many?

Guess the Number of Jellybeans in the Jar

⋮ ⋮ ⋮

Glass Jar with Colorful Jellybeans

150-200

250-300

350-400

450-500

Individual Expert Prediction

Single point of view
Personal biases affect judgment
Can be significantly wrong
High variance in predictions
Example: One person guesses 275 jellybeans

Crowd Average Prediction

Multiple diverse perspectives
Individual biases cancel out
Consistently more accurate
Lower variance, higher reliability
Example: 100 people average to 287 jellybeans

Francis Galton's Discovery (1907): The average guess of 787 people at a county fair was 1,207 pounds for an ox weight. The actual weight was 1,198 pounds - remarkably close!

This principle forms the foundation of ensemble methods in machine learning.

Random Forest Overview

Multiple trees → majority vote → better predictions

Tree 1

Prediction: Responded

Based on Income & Age

Tree 2

Prediction: No Response

Based on Email Rate & Loyalty

Tree 3

Prediction: Responded

Based on Age & Transaction History

Majority Vote Process

2

Responded

vs

1

No Response

Final Prediction: Responded

Sample Customer: Jane, Age 35, Income $65K, Email Rate 0.6, Loyalty Member

Tree	Features Used	Prediction	Confidence
1	Income, Age	Responded	0.8
2	Email Rate, Loyalty	No Response	0.6
3	Age, Transaction History	Responded	0.9

Bootstrapping Explained

Creating different training sets from same data

Original Dataset (N=12 customers)

ID	Age	Income	Response
C1	25	35K	No
C2	35	65K	Yes
C3	45	85K	Yes
C4	28	40K	No
C5	52	95K	Yes
C6	31	55K	Yes
C7	42	75K	Yes
C8	29	45K	No
C9	38	70K	Yes
C10	33	50K	No
C11	47	90K	Yes
C12	26	38K	No

Bootstrap Samples (with replacement)

Bootstrap Sample 1

Selected: C1, C3, C3, C5, C7, C2, C9, C11, C4, C6, C8, C1

Note: C3 and C1 appear twice

Bootstrap Sample 2

Selected: C2, C4, C6, C8, C10, C12, C5, C7, C9, C2, C11, C6

Note: C2 and C6 appear twice

Bootstrap Sample 3

Selected: C5, C1, C9, C11, C3, C7, C4, C8, C10, C12, C5, C9

Note: C5 and C9 appear twice

Key Principle: Each bootstrap sample is the same size as the original dataset but contains different combinations of records. This creates diverse training sets for each tree.

Physical Simulation Process:

Place 12 customer cards in a bag
Draw one card, record it, put it back
Repeat 12 times for one bootstrap sample
Create new bag for next bootstrap sample

Feature Randomness

Each tree considers only some features

Available Features in Dataset

1. Age
2. Income
3. Email Open Rate
4. Loyalty Member
5. Previous Purchases
6. Days Since Last Purchase
7. Average Transaction Value
8. Preferred Channel

Total Features: 8
Typical Selection: √8 ≈ 3 features per split

Feature Subsets for Different Trees

Tree 1 Features

Age, Income, Email Rate

Tree 2 Features

Loyalty, Previous Purchases, Channel

Tree 3 Features

Income, Days Since Purchase, Transaction Value

Without Feature Randomness

Strong features dominate all trees
Trees become very similar
Limited diversity in predictions
Higher correlation between trees
Reduced ensemble benefit

With Feature Randomness

Each tree focuses on different aspects
Diverse decision-making strategies
Captures various feature interactions
Lower correlation between trees
Improved ensemble performance

Benefit: Feature randomness prevents any single strong predictor from dominating the forest, ensuring each tree contributes unique insights to the final prediction.

Random Forest Algorithm

4 simple steps with clear process

1. Bootstrap Sampling

Create B bootstrap samples from original dataset with replacement

Example: For 1000 customers, create 100 different samples of 1000 customers each

2. Random Features

For each split, randomly select √p features from p total features

Example: From 8 features, randomly choose 3 for each decision node

3. Build Trees

Train decision trees on bootstrap samples using random feature subsets

Example: Grow 100 trees, each trained on different data and features

4. Aggregate Predictions

Combine predictions through majority vote (classification) or averaging (regression)

Example: If 70 trees predict "Responded", final prediction is "Responded"

Implementation Example

Original Data: 1000 customers, 8 features
Bootstrap Samples: 100 samples of 1000 customers each
Feature Selection: √8 ≈ 3 features per split
Trees: 100 decision trees trained independently
Prediction: Majority vote from all 100 trees

Sample 1
Features: Age, Income, Email

Sample 2
Features: Loyalty, Purchases, Channel

⋮

96 more trees...

Key Insight: The algorithm's simplicity belies its power. Random sampling in both data and features creates diversity, while aggregation harnesses the collective intelligence of the forest.

Model Evaluation

Understanding performance through confusion matrix and metrics

Confusion Matrix

85
True Negative

15
False Positive

10
False Negative

90
True Positive

Sample Dataset Results (200 customers)

True Negatives: 85 correctly predicted non-responders

True Positives: 90 correctly predicted responders

False Positives: 15 incorrectly predicted responders

False Negatives: 10 missed responders

Performance Metrics

Accuracy

Formula: (TP + TN) / Total

Result: (90 + 85) / 200 = 87.5%

Overall correctness of predictions

Precision

Formula: TP / (TP + FP)

Result: 90 / (90 + 15) = 85.7%

Of predicted responders, how many actually responded?

Recall (Sensitivity)

Formula: TP / (TP + FN)

Result: 90 / (90 + 10) = 90.0%

Of actual responders, how many did we find?

F1-Score

Formula: 2 × (Precision × Recall) / (Precision + Recall)

Result: 2 × (0.857 × 0.90) / (0.857 + 0.90) = 87.8%

Harmonic mean of precision and recall

Interpretation: This model shows strong performance with high accuracy and balanced precision/recall. The 87.5% accuracy means we correctly identify customer response 7 out of 8 times.

Random Forest vs Single Tree

Accuracy, interpretability, and speed trade-offs

Single Decision Tree

Accuracy: Moderate (70-80% typical)
Interpretability: Excellent - clear rules
Training Speed: Very fast
Prediction Speed: Very fast
Overfitting: High risk
Stability: Low - sensitive to data changes
Feature Importance: Limited perspective
Memory Usage: Very low

Random Forest

Accuracy: High (85-95% typical)
Interpretability: Moderate - feature importance
Training Speed: Slower (parallel processing helps)
Prediction Speed: Moderate
Overfitting: Low risk - built-in regularization
Stability: High - robust to data variations
Feature Importance: Comprehensive ranking
Memory Usage: Higher (stores multiple trees)

Scenario Decision Guide

Categorize these scenarios for Single Tree vs Random Forest:

Scenario A: Medical Diagnosis

Doctor needs to understand exactly why the system made a diagnosis recommendation.

Single Tree

Random Forest

Scenario B: Credit Card Fraud

Bank needs highest possible accuracy to minimize false positives and false negatives.

Single Tree

Random Forest

Scenario C: Mobile App

Real-time recommendations with millisecond response requirements.

Single Tree

Random Forest

Scenario D: Marketing Campaign

Optimize customer targeting with large dataset and complex patterns.

Single Tree

Random Forest

Evaluation Quiz

Test your understanding: Calculate model performance

Scenario: Email Marketing Campaign Prediction

Your Random Forest model predicted customer responses for 500 customers. Here are the results:

220

30

40

210

Total Customers: 500

True Negatives (TN): 220

False Positives (FP): 30

False Negatives (FN): 40

True Positives (TP): 210

Calculate Performance Metrics

1. Accuracy

Formula: (TP + TN) / Total

Calculation: ( + ) /

Result: or %

2. Precision

Formula: TP / (TP + FP)

Calculation: / ( + )

Result: or %

3. Recall (Sensitivity)

Formula: TP / (TP + FN)

Calculation: / ( + )

Result: or %

4. F1-Score

Formula: 2 × (Precision × Recall) / (Precision + Recall)

Using your calculated Precision and Recall values:

F1-Score: or %

Business Interpretation

After calculating the metrics above, write a brief interpretation:

Summary and Next Steps

From trees to forests: A journey in ensemble learning

What We Learned

Decision trees make predictions through hierarchical splits, Random forests combine multiple trees for better accuracy, Ensemble methods leverage the wisdom of crowds

Key Takeaways

Single trees are interpretable but prone to overfitting, Random forests trade interpretability for robustness, Bootstrap sampling and feature randomness create diversity

Practical Applications

Customer segmentation and campaign optimization, Risk assessment and fraud detection, Recommendation systems and personalization

Decision

Trees ✓

Random

Forests ✓

Advanced:

Deep Learning

Immediate Next Steps

Practice with the retail campaign dataset
Implement Random Forest in Python/R
Compare performance with single trees
Analyze feature importance rankings
Experiment with hyperparameter tuning

Advanced Topics to Explore

Gradient Boosting (XGBoost, LightGBM)
Out-of-bag error estimation
Handling imbalanced datasets
Feature engineering for tree-based models
Model interpretability techniques (SHAP, LIME)

Remember: Machine learning is about finding patterns in data to make predictions. Tree-based methods excel at capturing non-linear relationships while remaining relatively interpretable - making them invaluable tools in the data scientist's toolkit.

The Problem of Overfitting

When your model memorizes instead of learning

Training Data: Customer Campaign Response

Customer	Age	Income	Email Rate	Response
A	25	35K	0.2	No
B	35	65K	0.6	Yes
C	45	85K	0.8	Yes
D	28	40K	0.3	No
E	52	95K	0.7	Yes
F	31	55K	0.4	Yes

Training Accuracy: 100% ✓

Overfitted Decision Tree

Age ≥ 26?

Yes

Income ≥ 36K?

Yes

Email ≥ 0.35?

Yes

No

New Customer Test Data

Customer	Age	Income	Email Rate	Actual	Predicted
G	30	60K	0.5	Yes	No
H	40	70K	0.6	Yes	Yes
I	33	45K	0.2	No	Yes
J	27	50K	0.3	No	No

Test Accuracy: 50% ✗

Signs of Overfitting

Perfect training accuracy: 100% correct on training data
Poor test performance: Low accuracy on new data
Complex tree structure: Too many splits and branches
Memorization: Learns specific examples, not patterns
High variance: Small data changes cause big model changes

Why Overfitting Happens

Too much complexity: Tree grows too deep with too many rules
Limited training data: Not enough examples to learn true patterns
Noise in data: Model learns from random variations
No stopping criteria: Tree keeps splitting until pure leaves
Feature abundance: Too many irrelevant features confuse the model

Business Impact: An overfitted marketing model might achieve 100% accuracy on historical campaigns but fail completely on new customers, leading to wasted marketing budget and poor targeting decisions.

The Problem of Underfitting

When your model is too simple to capture patterns

Training Data: Customer Campaign Response

Customer	Age	Income	Email Rate	Loyalty	Response
A	25	35K	0.2	No	No
B	35	65K	0.6	Yes	Yes
C	45	85K	0.8	Yes	Yes
D	28	40K	0.3	No	No
E	52	95K	0.7	Yes	Yes
F	31	55K	0.4	Yes	Yes
G	29	45K	0.1	No	No
H	38	70K	0.5	Yes	Yes

Underfitted Decision Tree (Too Simple)

Income ≥ $50K?

Yes

Responded

No

No Response

Only considers income, ignores other important features

Training Performance

Accuracy: 75% (6/8 correct)

Misses important patterns in the data

Test Performance

Accuracy: 73% (similar to training)

Consistently mediocre performance

Signs of Underfitting

Poor training accuracy: Low performance on training data
Poor test performance: Similar low accuracy on new data
Overly simple model: Too few splits or rules
High bias: Makes systematic errors consistently
Ignores important features: Misses relevant information

Why Underfitting Happens

Too simple model: Tree depth limited too severely
Insufficient features: Important variables not included
Poor feature selection: Wrong variables chosen for splits
Early stopping: Tree building stopped too soon
Over-regularization: Too many constraints on model complexity

Fixing Overfitting

Solutions: Pruning, cross-validation, ensemble methods, more training data

Sweet Spot

Goal: Balance complexity and generalization for optimal performance

Fixing Underfitting

Solutions: Deeper trees, more features, feature engineering, reduce regularization

The Goldilocks Principle: Your model should be complex enough to capture important patterns but simple enough to generalize to new data. Random Forests help achieve this balance through ensemble averaging.

Age	Income	Email Rate	Response
25	35K	0.2	No
35	65K	0.6	Yes
45	85K	0.8	Yes
28	40K	0.3	No
52	95K	0.7	Yes
31	55K	0.4	Yes
42	75K	0.9	Yes
29	45K	0.1	No
38	70K	0.5	Yes
33	50K	0.3	No
47	90K	0.8	Yes
26	38K	0.2	No

ID	Age	Income	Response
C1	25	35K	No
C2	35	65K	Yes
C3	45	85K	Yes
C4	28	40K	No
C5	52	95K	Yes
C6	31	55K	Yes
C7	42	75K	Yes
C8	29	45K	No
C9	38	70K	Yes
C10	33	50K	No
C11	47	90K	Yes
C12	26	38K	No

Customer	Age	Income	Email Rate	Response
A	25	35K	0.2	No
B	35	65K	0.6	Yes
C	45	85K	0.8	Yes
D	28	40K	0.3	No
E	52	95K	0.7	Yes
F	31	55K	0.4	Yes

Customer	Age	Income	Email Rate	Loyalty	Response
A	25	35K	0.2	No	No
B	35	65K	0.6	Yes	Yes
C	45	85K	0.8	Yes	Yes
D	28	40K	0.3	No	No
E	52	95K	0.7	Yes	Yes
F	31	55K	0.4	Yes	Yes
G	29	45K	0.1	No	No
H	38	70K	0.5	Yes	Yes

Age	Income	Email Rate	Response
25	35K	0.2	No
35	65K	0.6	Yes
45	85K	0.8	Yes
28	40K	0.3	No
52	95K	0.7	Yes
31	55K	0.4	Yes
42	75K	0.9	Yes
29	45K	0.1	No
38	70K	0.5	Yes
33	50K	0.3	No
47	90K	0.8	Yes
26	38K	0.2	No

ID	Age	Income	Response
C1	25	35K	No
C2	35	65K	Yes
C3	45	85K	Yes
C4	28	40K	No
C5	52	95K	Yes
C6	31	55K	Yes
C7	42	75K	Yes
C8	29	45K	No
C9	38	70K	Yes
C10	33	50K	No
C11	47	90K	Yes
C12	26	38K	No

Customer	Age	Income	Email Rate	Response
A	25	35K	0.2	No
B	35	65K	0.6	Yes
C	45	85K	0.8	Yes
D	28	40K	0.3	No
E	52	95K	0.7	Yes
F	31	55K	0.4	Yes

Customer	Age	Income	Email Rate	Loyalty	Response
A	25	35K	0.2	No	No
B	35	65K	0.6	Yes	Yes
C	45	85K	0.8	Yes	Yes
D	28	40K	0.3	No	No
E	52	95K	0.7	Yes	Yes
F	31	55K	0.4	Yes	Yes
G	29	45K	0.1	No	No
H	38	70K	0.5	Yes	Yes

Age	Income	Email Rate	Response
25	35K	0.2	No
35	65K	0.6	Yes
45	85K	0.8	Yes
28	40K	0.3	No
52	95K	0.7	Yes
31	55K	0.4	Yes
42	75K	0.9	Yes
29	45K	0.1	No
38	70K	0.5	Yes
33	50K	0.3	No
47	90K	0.8	Yes
26	38K	0.2	No

ID	Age	Income	Response
C1	25	35K	No
C2	35	65K	Yes
C3	45	85K	Yes
C4	28	40K	No
C5	52	95K	Yes
C6	31	55K	Yes
C7	42	75K	Yes
C8	29	45K	No
C9	38	70K	Yes
C10	33	50K	No
C11	47	90K	Yes
C12	26	38K	No

Customer	Age	Income	Email Rate	Response
A	25	35K	0.2	No
B	35	65K	0.6	Yes
C	45	85K	0.8	Yes
D	28	40K	0.3	No
E	52	95K	0.7	Yes
F	31	55K	0.4	Yes

Customer	Age	Income	Email Rate	Loyalty	Response
A	25	35K	0.2	No	No
B	35	65K	0.6	Yes	Yes
C	45	85K	0.8	Yes	Yes
D	28	40K	0.3	No	No
E	52	95K	0.7	Yes	Yes
F	31	55K	0.4	Yes	Yes
G	29	45K	0.1	No	No
H	38	70K	0.5	Yes	Yes