Reading is passive. Questioning is what makes you a researcher.
Critical questions help you:
Not what they say — what do they actually solve? Are these the same thing?
Every formulation buries assumptions. What are they assuming is true?
Are hyperparameters tuned equally? Are strong recent baselines missing?
Does it work on different datasets? Different domains? Cold-start?
Remove each component — what actually drives the performance gain?
Add one specific to your research area.
Papers often claim to solve one problem while actually solving a narrower or different one. Look for the gap.
Claimed: Addresses LLM hallucination and lack of domain knowledge in RS.
Actual: Addresses this only when a KG exists for the domain — which is not always available in practice.
Ask yourself: If I removed the KG, what problem is left? Is the problem actually the KG retrieval design, not LLM hallucination itself?
Authors rarely announce their assumptions. You have to find them by reading the formulation carefully.
Does the method assume explicit ratings? Dense interactions? A specific popularity distribution?
Independence of ratings? Linear relationships? That the KG is complete and accurate?
Random negative sampling? Leave-one-out? These choices dramatically inflate reported numbers.
The popularity threshold p=50% assumes that popularity follows a clean binary split. In reality, popularity is a continuous power-law distribution — the threshold is a design choice presented as fact.
Ask: different domain? Different language? No KG available? Cold-start users?
K-RagRec: They do test generalisation (Table 6) — MovieLens→Amazon Book zero-shot. Good. But still only movie/book domains with KG coverage.
When they remove the GNN Encoder, accuracy drops 37–45%. That's the biggest single component.
Implication: The graph encoding is doing most of the work — not the retrieval strategy itself.
💡 Research insight: If encoding matters most, could you get similar results with better text encoding instead of a KG?
The 5 questions aren't meant to be run mechanically. They're lenses — use the ones that apply.
Remove component X → performance drops → X is important → can you do X better? That's a paper.
Decide whether to be convinced. Every result has a context that limits it.
The most revealing thing about an experiment is often what was left out.
The component with the biggest drop is often the best target for your extension idea.
Next: M2 · L4 — Reading Experimental Sections