NanoGPT
nanoGPT is a rewrite of minGPT that prioritizes teeth over education. Still under active development, but currently the file train.py reproduces GPT-2 (124M) on OpenWebText, running on a single 8XA100 40GB node in about 4 days of training https://github.com/karpathy/nanoGPT (built in Python)
The code itself is plain and readable: train.py is a ~300-line boilerplate training loop and model.py a ~300-line GPT model definition, which can optionally load the GPT-2 weights from OpenAI.
install
usage
baselines
finetuning
Finetuning takes very little time, e.g. on a single GPU just a few minutes.
efficiency notes
Edited: | Tweet this! | Search Twitter for discussion