Before writing a single line of code — plan everything.
Surprises in implementation come from surprises in planning.
A good implementation plan forces you to:
Download, preprocess, split. What format? What filtering?
List every equation. Map each to a module or layer.
Loss function. Optimiser. Batch size. Negative sampling. Epochs.
Which metrics? Which split? Full ranking or candidate-based?
Which values to tune? What search space? How many runs?
Rule: You should be able to write the full plan in one sitting, before touching code. If you can't — you haven't read the paper carefully enough.
Recommended build order for K-RagRec:
WHY BUILD IN THIS ORDER?
Each step depends on the previous. You can test Steps 1–3 independently of the LLM. If you start at Step 7, you can't debug anything.
Verify shapes and outputs at each step. A silent bug in Step 2 will corrupt everything downstream.
This is a wish list, not a plan. No shapes, no equations, no missing details flagged.
Specific enough that a second person could implement it independently.
Σ = loop. argmin = optimiser. ← = update. List inputs, outputs, loops, updates — in that order.
Initialisation, negative sampling, LR schedule, preprocessing. Check appendix → code repo → cited papers → field defaults.
5 phases: Data → Model → Training → Evaluation → Hyperparams. Specific enough that someone else could implement it.
Next: M5 · Mathematical Reasoning — definitions, theorems, proofs