This repository was created as an exercise to gain a deeper understanding LLMs, transformers, and more concretely, the attention mechanism (which is written here from scratch).
We code up our own "itty bitty GPT" and train it on a chunk of the TinyStories dataset.
This is far from the most effective implementation, but I think it is quite readable and easy to follow. This exercised is inspired by an excellent LLM interpretability course that I sat in on.
The tutorial.py
notebook houses the model, data set creation, training, and prompting, all in one big notebook. Looking through it sequentially allows one to start at the attention head mechanism and end with prompting a model that they trained on their device.
The modular
folder has a more proper structure, where the model and text dataset are defined in separate files, with separate notebooks for both training and prompting.
This is my first stab at this, and is very much a work in progress. At the moment, the stories generated by the model are not coherent, so there is much work to be done!