25 multiple-choice questions on tokenisation, Transformers, BERT, GPT and the Hugging Face ecosystem, plus 5 short-answer questions.
Select the single best answer for each question. Each question is worth 1 mark.
Tokenisation in NLP is the process of:
A vocabulary in the context of an NLP model refers to:
A word embedding represents a word as:
The Transformer architecture is primarily built around:
BERT stands for:
In Hugging Face, pipeline("sentiment-analysis"):
Fine-tuning a pre-trained language model means:
Subword tokenisation (e.g. Byte-Pair Encoding) is preferred over word-level tokenisation because:
In BERT, the [CLS] token is used as:
The attention mechanism allows a Transformer to:
Positional encoding is added to token embeddings in Transformers to:
AutoTokenizer.from_pretrained("bert-base-uncased") loads:
Transfer learning in NLP involves:
A language model is a model that:
AutoModelForSequenceClassification is used for:
In Hugging Face tokenizers, padding is used to:
An encoder-only Transformer model (e.g. BERT) is best suited for:
A token ID is:
The attention mask in Hugging Face tokenizers tells the model:
GPT-style (decoder-only) models are primarily designed for:
BERT is pre-trained using:
model.eval() before inference in PyTorch/Hugging Face:
What does tokenizer("Hello world", return_tensors="pt") return?
Named Entity Recognition (NER) is the task of:
The key architectural difference between RNNs and Transformers is that Transformers:
Answer each question in 2–4 sentences. Precise technical language is expected. Code snippets are welcome where relevant.
Explain what tokenisation is and why subword tokenisation (such as Byte-Pair Encoding) is preferred over splitting on whitespace alone.written
Describe the self-attention mechanism in a Transformer. What are queries, keys, and values, and how are they used to compute the attention output?written
What is the difference between BERT and GPT in terms of architecture (encoder-only vs decoder-only) and the tasks each is best suited for?written
Show how you would use the Hugging Face pipeline API to perform sentiment analysis on a list of sentences. What does the output look like?written
Explain what fine-tuning a pre-trained language model means. What data is needed, what is trained, and why is it more efficient than training from scratch?written
Complete all 30 questions then click Submit. Your MCQ score (25/25) will be shown. Short answers are marked separately.