Grokking — Neel Nanda

This is an accompanying website to our paper, Progress Measures for Grokking via Mechanistic Interpretability. We also provide the code used to train our models here.

Figure 1: The algorithm implemented by the one-layer transformer for modular addition. Given two numbers a and b, the model projects each point to a corresponding rotation using its embedding matrix. Using its attention and MLP layers, it then composes the rotations to get a representation of a+b mod P. Finally, it “reads off” the logits for each c ∈ {0, 1, ..., P − 1}, by rotating by −c to get cos(w(a + b − c)), which is maximized when a + b ≡ c mod P (since w is a multiple of 2π P ).