CP3501 · Week 5 NLP, Transformers & Hugging Face Quiz
0 / 30
Week 5  ·  Assessment Quiz

NLP, Transformers & Hugging Face

25 multiple-choice questions on tokenisation, Transformers, BERT, GPT and the Hugging Face ecosystem, plus 5 short-answer questions.

📋 30 questions total ⭐ 30 marks 🕐 No time limit 🔒 Answers not revealed
PART A

Multiple Choice  (25 marks)

Select the single best answer for each question. Each question is worth 1 mark.

Q1

Tokenisation in NLP is the process of:

Q2

A vocabulary in the context of an NLP model refers to:

Q3

A word embedding represents a word as:

Q4

The Transformer architecture is primarily built around:

Q5

BERT stands for:

Q6

In Hugging Face, pipeline("sentiment-analysis"):

Q7

Fine-tuning a pre-trained language model means:

Q8

Subword tokenisation (e.g. Byte-Pair Encoding) is preferred over word-level tokenisation because:

Q9

In BERT, the [CLS] token is used as:

Q10

The attention mechanism allows a Transformer to:

Q11

Positional encoding is added to token embeddings in Transformers to:

Q12

AutoTokenizer.from_pretrained("bert-base-uncased") loads:

Q13

Transfer learning in NLP involves:

Q14

A language model is a model that:

Q15

AutoModelForSequenceClassification is used for:

Q16

In Hugging Face tokenizers, padding is used to:

Q17

An encoder-only Transformer model (e.g. BERT) is best suited for:

Q18

A token ID is:

Q19

The attention mask in Hugging Face tokenizers tells the model:

Q20

GPT-style (decoder-only) models are primarily designed for:

Q21

BERT is pre-trained using:

Q22

model.eval() before inference in PyTorch/Hugging Face:

Q23

What does tokenizer("Hello world", return_tensors="pt") return?

Q24

Named Entity Recognition (NER) is the task of:

Q25

The key architectural difference between RNNs and Transformers is that Transformers:

PART B

Short Answer  (5 marks — marked by lecturer)

Answer each question in 2–4 sentences. Precise technical language is expected. Code snippets are welcome where relevant.

Q26

Explain what tokenisation is and why subword tokenisation (such as Byte-Pair Encoding) is preferred over splitting on whitespace alone.written

Your answer
0 / 700
Q27

Describe the self-attention mechanism in a Transformer. What are queries, keys, and values, and how are they used to compute the attention output?written

Your answer
0 / 700
Q28

What is the difference between BERT and GPT in terms of architecture (encoder-only vs decoder-only) and the tasks each is best suited for?written

Your answer
0 / 700
Q29

Show how you would use the Hugging Face pipeline API to perform sentiment analysis on a list of sentences. What does the output look like?written

Your answer
Include a brief code sketch in your answer.
0 / 700
Q30

Explain what fine-tuning a pre-trained language model means. What data is needed, what is trained, and why is it more efficient than training from scratch?written

Your answer
0 / 700

Complete all 30 questions then click Submit. Your MCQ score (25/25) will be shown. Short answers are marked separately.

MCQ Score
0 / 25
✏️ Your 5 short-answer responses are recorded for your lecturer.
Full total: MCQ + short-answer marks = / 30