Week 7 · Assessment Quiz

Random Forests & scikit-learn

25 multiple-choice questions on decision trees, random forests, feature importance and sklearn, plus 5 short-answer questions.

📋 30 questions total ⭐ 30 marks 🕐 No time limit 🔒 Answers not revealed

A decision tree makes predictions by:

AComputing the dot product of input features and learned weights BRecursively splitting the feature space based on threshold rules learned from training data CAveraging the predictions of many weak learners DProjecting features onto principal components and using their magnitudes

A random forest is best described as:

AA single very deep decision tree trained on the full dataset BA neural network with tree-shaped skip connections CAn ensemble of many decision trees, each trained on a bootstrap sample with random feature subsampling, whose predictions are averaged or majority-voted DA decision tree where splits are chosen randomly rather than optimally

Bagging (Bootstrap Aggregating) works by:

ABoosting the weights of misclassified examples in each iteration BTraining multiple models on different random subsets (with replacement) of the training data and combining their predictions CRemoving outliers from the training set before fitting each tree DPruning each tree back to a maximum depth before combining

Feature importance in a random forest is typically measured by:

AHow much the average impurity decreases across all splits on that feature, summed over all trees BThe correlation coefficient between the feature and the target variable CThe number of times the feature appears in the training data DThe variance of the feature values in the training set

The out-of-bag (OOB) error is:

AThe error rate on a manually held-out test set BThe error measured on the training examples that were excluded from each tree's bootstrap sample — acting as a free validation estimate CThe average depth of all trees in the forest DThe fraction of features not used in any split

n_estimators in sklearn's RandomForestClassifier controls:

AThe maximum depth of each individual tree BThe number of trees in the forest CThe minimum number of samples required to split a node DThe fraction of features randomly considered at each split

Setting a very small max_depth on a decision tree results in:

AHigh bias (underfitting) — the tree is too shallow to capture complex patterns BHigh variance (overfitting) — the tree memorises training noise CFaster inference but identical accuracy to a deeper tree DMore features being considered at each split

Random forests use feature subsampling at each split to:

ASpeed up training by ignoring unimportant features permanently BEnsure each tree uses exactly the same features for reproducibility CDecorrelate the trees so they make different errors, reducing ensemble variance DReduce memory usage by not loading all feature values

Compared to a single decision tree, a random forest generally has:

AHigher bias and higher variance BLower variance because averaging many trees reduces the impact of any single noisy tree CHigher variance because more trees means more parameters DLower bias because more trees can capture more complex patterns

Q10

max_features in sklearn's RandomForestClassifier controls:

AThe number of features randomly selected as candidates at each split BThe maximum number of leaf nodes in each tree CThe total number of features in the dataset after preprocessing DThe maximum depth allowed for each tree

Q11

Pruning a decision tree achieves:

AGrowing the tree as deep as possible to minimise training error BAdding more branches at the leaves to improve coverage of rare examples CRemoving branches that provide little predictive power, reducing model complexity and overfitting DMerging two decision trees into a single combined tree

Q12

Gini impurity at a node measures:

AThe average depth at which a split is made in the tree BThe probability of incorrectly classifying a randomly chosen element if it is labelled according to the class distribution at that node CThe correlation between the feature used for the split and the target DThe total number of training examples that reach that node

Q13

A decision tree will overfit when:

AIt is allowed to grow until every leaf is pure (contains only one class), memorising the training data BIts maximum depth is set to 2, limiting it to very simple decision boundaries CIt is trained on more than 10,000 examples per class DIt uses Gini impurity instead of information gain as the splitting criterion

Q14

The key advantage of random forests over single decision trees is:

AThey always train faster because each tree is smaller BThey can handle text data without any preprocessing CThey dramatically reduce variance by combining many uncorrelated trees, leading to much better generalisation DThey require no hyperparameter tuning unlike single trees

Q15

oob_score=True in sklearn:

AAdds the OOB examples to the training set after initial fitting BComputes a validation score using OOB samples for each tree, providing a free estimate of generalisation without a separate validation set CRaises an error if any examples were not included in any bootstrap sample DPlots the OOB error as a function of the number of trees

Q16

In sklearn, model.fit(X_train, y_train):

ATrains the model on X_train with y_train as labels, learning the model parameters BEvaluates the model on X_train and returns the training accuracy CPreprocesses X_train by normalising and encoding it DSaves the trained model to disk in sklearn's native format

Q17

The difference between predict() and predict_proba() in sklearn is:

Apredict() is faster; predict_proba() is more accurate Bpredict() requires GPU; predict_proba() runs on CPU Cpredict() returns the predicted class label; predict_proba() returns class probabilities for each example Dpredict_proba() rounds probabilities to 0 or 1, making it equivalent to predict()

Q18

Cross-validation is used in model evaluation to:

ATrain the model faster by using smaller datasets BObtain a more reliable estimate of generalisation performance by averaging metrics over multiple train/test splits CAutomatically select the best hyperparameters without human intervention DCombine multiple models into an ensemble

Q19

A model with high bias and low variance is likely:

AUnderfitting — it is too simple and makes systematic errors BOverfitting — it fits the training data very well but generalises poorly CWell-calibrated — it strikes the ideal balance between complexity and generalisation DConverging too slowly due to a small learning rate

Q20

A feature importance of 0.0 for a column means:

AThe column has zero variance — all values are identical BThe column was normalised to zero before training CThat feature was never used in any split in any tree — it contributes nothing to the model's predictions DThe model assigned negative importance which was clipped to zero

Q21

An ensemble method improves prediction by:

ATraining a single very large model with billions of parameters BCombining predictions from multiple models to reduce variance and/or bias compared to any individual model CUsing a larger training dataset automatically sourced from the internet DApplying advanced data augmentation during the final evaluation step

Q22

A decision tree grown without any depth limit will typically:

APerfectly fit the training data but generalise poorly — high variance, low bias (overfit) BConverge to the globally optimal set of splits for the training data CAchieve the same test accuracy as a pruned tree DFail to grow because sklearn requires a max_depth to be specified

Q23

Random forests reduce variance compared to a single tree because:

AEach tree is more accurate than a single deep tree BTrees are trained on the same data but with different architectures CAveraging many uncorrelated (or weakly correlated) predictions cancels out individual tree errors DRandom forests use a lower learning rate that prevents overfitting

Q24

What is grid search used for in model development?

APlotting the model's decision boundary on a 2D feature grid BSystematically searching over a specified set of hyperparameter combinations to find the one with the best validation performance CDividing the feature space into a uniform grid for efficient split computation DRanking features by importance using a brute-force exhaustive search

Q25

Which of the following would you use to visualise the relative importance of features in a trained sklearn random forest?

Amodel.feature_importances_ — an array of importance scores, one per feature, which can be plotted as a bar chart Bmodel.feature_names_in_ — returns the features sorted by importance Cmodel.coef_ — the coefficient vector equivalent to feature importances Dmodel.oob_score_ — the OOB score represents feature importances

Answer each question in 2–4 sentences. Precise technical language is expected. Code snippets are welcome where relevant.

Q26

Explain how a random forest makes a prediction for a new data point. What role does each individual tree play and how are their outputs combined?written

Your answer

0 / 700

Q27

What is the out-of-bag (OOB) error in a random forest? Why does it provide a useful estimate of generalisation performance without needing a separate validation set?written

Your answer

0 / 700

Q28

Explain how feature importance is computed in a random forest. How could you use feature importances practically to improve a model or understand your data?written

Your answer

0 / 700

Q29

Explain the bias-variance tradeoff. Where does a fully-grown single decision tree sit on this spectrum, and where does a random forest sit? Why?written

Your answer

0 / 700

Q30

Compare and contrast decision trees and random forests across three dimensions: interpretability, variance, and computational cost. When would you choose a single decision tree over a random forest?written

Your answer

0 / 700

Your full name

Complete all 30 questions then click Submit. Your MCQ score (25/25) will be shown. Short answers are marked separately.

Random Forests & scikit-learn

Multiple Choice (25 marks)

Short Answer (5 marks — marked by lecturer)