M2 · Lesson 4 — Reading Papers Effectively

Reading
Experimental Sections

Results tables look objective. They're not.
Learn to read what they hide, not just what they show.

01

M2 · L4 — Structure

What to look for

4 components in
every experiment section

01

Datasets

Standard benchmarks or cherry-picked? Dense or sparse? How old are they?

02

Baselines

Are the strongest recent methods included? Are implementations fair?

03

Metrics

Do they report only the metrics where they win? Are evaluation protocols valid?

04 — most revealing

Ablation Study

Which component, when removed, hurts the most? That's where the real contribution is — and your research opportunity.

02

M2 · L4 — Results Tables

How to read a results table

Don't start at
the best number

Method

ACC

R@3

R@5

KG-Text

0.076

—

KAPING

0.079

—

G-retriever

0.274

0.532

0.650

K-RagRec (ours)

0.435

0.725

0.831

Find the weakest result first. Which dataset, metric, or backbone shows the smallest gap over the best baseline? That's where the method is most fragile.

K-RagRec on LLaMA-3: improvement is only 2.5% on R@5. Much weaker than the 27.8% on LLaMA-2. Why? The paper doesn't explain this clearly.

03

M2 · L4 — Common Tricks

Patterns that inflate results

Tricks papers use
(sometimes unconsciously)

🎯

Random negative sampling

Evaluating on 1 positive + 19 random negatives is much easier than full ranking. Numbers look good but don't reflect real recommendation quality.

📊

Cherry-picked datasets

Old, dense datasets (MovieLens-1M is from 2003) don't reflect modern RS challenges. Results may not transfer.

🔬

Weak baseline implementations

If authors reimplement baselines themselves instead of using original code, they may be suboptimal. Check if the paper cites the original implementation.

📈

Selective metric reporting

If a paper reports Recall@3 and Recall@5 but not NDCG, ask why. Different metrics tell different stories.

04

M2 · L4 — Ablation Studies

The most underread part of any paper

Ablation studies
reveal the truth

An ablation removes one component at a time to show its contribution. Read it as:

"If I remove X and performance drops a lot — X is the real contribution."

The component with the biggest drop is what the paper is actually about — and your best target for extension.

K-RagRec ablation (ML-1M ACC)

K-RagRec (full)

0.435

− Re-ranking

0.357

− Indexing

0.274

− Popularity

0.309

− GNN Encoder

0.196 ← biggest drop

Insight: GNN Encoder is the core contribution. Everything else is supporting infrastructure.

05

M2 · L4 — Synthesis

Full picture on K-RagRec

Reading K-RagRec
experiments critically

What they show well

Consistent improvement across 3 datasets
Efficiency comparison (Table 2) — inference time
Zero-shot generalisation study (Table 6)
Hallucination reduction (Table 7)
Multiple backbone LLMs tested

What they hide

No comparison to traditional CF baselines
Random negative evaluation inflates numbers
LLaMA-3 improvements much smaller — unexplained
p=50% threshold not rigorously justified
KG completeness not discussed

06

M2 · L4 — Checklist

Use every time

The experiment
reading checklist

What datasets? Standard or cherry-picked?
What baselines? Any strong method missing?
What metrics? Any notable omissions?
What's the evaluation protocol? Random or hard negatives?

Find the weakest result — which row/column?
Read the ablation — what's the biggest drop?
Does the conclusion match what the tables show?
What would break this method?

The last question is the most important. "What would break this method?" — your answer to that is the seed of your next research idea.

07

M2 Complete — Reading Papers Effectively

Module 2 complete

L1 · Anatomy

Know the skeleton

Methodology is the paper. Contributions list is the contract. Read gaps, not just results.

L2 · 3-Pass

Read efficiently

5 min → 30 min → 2 hrs. Stop when you have what you need. Most papers only need Pass 2.

L3–L4 · Critical

Question everything

Hidden assumptions, missing baselines, weak evaluation. Ablations reveal the real contribution.

Next: M3 · Documenting Findings — building your systematic knowledge base

08

ReadingExperimental Sections