DATA5000 Comprehensive Review Quiz

Question 1: AI vs ML Relationship

Below is a diagram showing the relationship between different AI concepts:

Based on this diagram, which statement is most accurate?

Machine Learning and Deep Learning are completely separate from AI Deep Learning is a subset of Machine Learning, which is a subset of AI AI is a subset of Machine Learning All three terms mean the same thing

Question 2: Four Types of Business Analytics

The chart below shows the progression of business analytics types:

A retail company wants to decide whether to increase inventory for winter products. Which type of analytics should they primarily use?

Descriptive - to see last year's winter sales Diagnostic - to understand why winter sales changed Predictive - to forecast winter demand Prescriptive - to determine optimal inventory levels

Question 3: Machine Learning Workflow

The flowchart below shows the typical ML workflow:

In the ML workflow, what is the primary purpose of the Train/Test Split step?

To make the dataset smaller and easier to process To evaluate model performance on unseen data and prevent overfitting To create two identical copies of the data for backup To separate different types of features in the dataset

Question 4: LightGBM vs NeuralProphet

Comparison of two algorithms learned in Week 2:

Algorithm	Best Use Case	Data Type	Key Strength
LightGBM	Tabular data classification/regression	Structured data with features	High accuracy, fast training
NeuralProphet	Time series forecasting	Sequential time-based data	Captures complex seasonality

A company wants to predict customer purchase amounts based on their demographics, past purchases, and website behavior. Which algorithm would be most appropriate?

NeuralProphet, because it handles multiple features LightGBM, because it's designed for structured tabular data with multiple features Both algorithms would perform equally well Neither algorithm is suitable for this task

Question 5: Neural Network Architecture

Below is a simplified neural network diagram:

What is the primary purpose of the hidden layer in this neural network?

To store the input data temporarily To learn complex patterns and relationships between input features To reduce the size of the data To generate the final output directly

Question 6: Transformer Attention Mechanism

Below is a visualization of how attention works in transformers:

What does the attention mechanism in transformers primarily accomplish?

It reduces the computational cost of processing It allows the model to focus on relevant parts of the input when processing each element It eliminates the need for training data It converts text into numerical format

Question 7: Treatment Effects Visualization

Below is a chart showing treatment effects for a marketing campaign across different customer segments:

What does this data suggest about the marketing campaign's effectiveness?

The campaign works equally well for all customer segments The campaign shows heterogeneous treatment effects, being most effective for younger customers The campaign only works for senior customers The average treatment effect accurately represents the individual effects

Question 8: SHAP Values Interpretation

SHAP values for a house price prediction model:

Based on these SHAP values, which feature has the strongest impact on increasing the house price?

House size Location Garden Age of the house

Question 9: True/False

Deep Learning is always better than traditional Machine Learning for business applications.

True False

Question 10: True/False

Correlation always implies causation in business data analysis.

True False

Question 11: Analytics Integration

A retail company has the following business scenario:

They want to understand customer purchasing patterns
Predict future sales for inventory planning
Determine the causal impact of marketing campaigns
Decide optimal marketing budget allocation

Which combination of analytics types would best address these needs?

Only predictive analytics is needed Descriptive, predictive, and prescriptive analytics in sequence Only prescriptive analytics is sufficient Diagnostic and descriptive analytics only

Question 12: Algorithm Selection

A streaming service wants to:

Predict how many users will watch a new show (based on viewer demographics, genre preferences, time of release)
Forecast daily viewing hours for capacity planning (using historical viewing data)
Understand what factors drive user engagement (interpretable model needed)

Which algorithms would be most appropriate for these three tasks respectively?

NeuralProphet, LightGBM, Deep Learning LightGBM, NeuralProphet, SHAP with LightGBM Deep Learning, Transformers, Neural Networks All tasks should use the same algorithm

Question 13: Cross-Validation Calculation

A machine learning model is evaluated using 5-fold cross-validation with the following accuracy scores for each fold:

Fold 1	Fold 2	Fold 3	Fold 4	Fold 5
0.87	0.92	0.85	0.89	0.91

Calculate the average cross-validation accuracy (round to 2 decimal places):

Question 14: Treatment Effect Calculation

A marketing campaign analysis shows the following results:

Group	Average Purchase Before ($)	Average Purchase After ($)
Treatment Group (received campaign)	45.20	67.80
Control Group (no campaign)	44.50	48.90

Calculate the Difference-in-Differences (DiD) treatment effect (round to 2 decimal places):

Formula: DiD = (Treatment_After - Treatment_Before) - (Control_After - Control_Before)

$

Question 15: SHAP Values Calculation

A house price prediction model has the following components:

Component	Value
Base Value (average prediction)	$280,000
Location SHAP value	+$45,000
Size SHAP value	+$32,000
Age SHAP value	-$18,000
Condition SHAP value	+$12,000

Calculate the final model prediction for this house:

Formula: Prediction = Base Value + Sum of all SHAP values

$

Question 16: Neural Network Forward Propagation

A simple neural network neuron receives the following inputs and has learned these weights:

Input	Value	Weight	Input × Weight
x₁ (house size)	0.8	0.6	0.48
x₂ (location score)	0.9	0.4	0.36
x₃ (house age)	0.3	-0.2	-0.06
Bias	1.0	0.1	0.10

Calculate the final output of this neuron after applying the sigmoid activation function (round to 2 decimal places):

Step 1: z = Σ(input × weight) + bias = 0.48 + 0.36 + (-0.06) + 0.10 = ?

Step 2: output = σ(z) = 1/(1+e⁻ᶻ)

Question 17: Comprehensive Understanding

You are hired as an AI consultant for a healthcare system. The client asks:

"We want to use AI to improve patient outcomes and reduce costs. We have 5 years of patient data including demographics, treatments, outcomes, and costs. What's your approach?"

Which comprehensive approach best demonstrates your DATA5000 knowledge?

Build a deep learning model to predict everything Start with descriptive analytics to understand patterns, use predictive models for risk assessment, apply causal analysis to identify effective treatments, and use prescriptive analytics for optimal resource allocation Only focus on correlation analysis between treatments and outcomes Use only traditional statistical methods without AI