Real-Time Research Recording: Can a Transformer Re-Derive Positional Info?

New experiment: Recording myself real-time as I do mechanistic interpretability research! I try to answer the question of what happens if you train a toy transformer without positional embeddings on the task of "predict the previous token" - turns out that a two layer model can rederive them! You can watch me do it here, and you can follow along with my code here. This uses a transformer mechanistic interpretability library I'm writing called EasyTransformer, and this was a good excuse to test it out and create a demo!

This is an experiment in recording and publishing myself doing "warts and all" research - figuring out how to train the model and operationalising an experiment (including 15 mins debugging loss spikes...), real-time coding and tensor fuckery, and using my go-to toolkit. My hope is to give a flavour of what actual research can look like - how long do things actually take, how often do things go wrong, what is my thought process and what am I keeping in my head as I go, what being confused looks like, and how I try to make progress. I'd love to hear whether you found this useful, and whether I should bother making a second half!

Though I don't want to overstate this - this was still a small, self-contained toy question that I chose for being a good example task to record (and I wouldn't have published it if it was TOO much of a mess).

$\setCounter{0}$
Previous
Previous

A Walkthrough of Interpretability in the Wild (w/ authors Kevin Wang, Arthur Conmy & Alexandre Variengien)

Next
Next

A Barebones Guide to Mechanistic Interpretability Prerequisites