Dec 11, 2023
Introducing gigaGPT: GPT-3 sized models in 565 lines of code
Posted by Cecile G. Tamura in categories: computing, transportation
Cerebras introduces gigaGPT: GPT-3 sized models in 565 lines of code.
GigaGPT is Cerebras’ implementation of Andrei Karpathy’s nanoGPT – the simplest and most compact code base to train and fine-tune GPT models. Whereas nanoGPT can train models in the 100M parameter range, gigaGPT trains models well over 100B parameters. We do this without introducing additional code or relying on third party frameworks – the entire repo is just 565 lines of code. Instead gigaGPT utilizes the large memory and compute capacity of Cerebras hardware to enable large scale training on vanilla torch.nn code. With no modifications, gigaGPT supports long context lengths and works with a variety of optimizers.
Why gigaGPT
Continue reading “Introducing gigaGPT: GPT-3 sized models in 565 lines of code” »