LLM | kkirchheim.de

Training a German LLM from Scratch, 14 Nov. 2024 (posts)

This article is not finished and will be updated. The research group I work with has access to a small GPU cluster, which occasionally sits idle. To avoid wasting valuable compute resources (IDLE GPUs essentially burn money through opportunity costs), I decided to train a German GPT-2-style model from scratch, using only German text. Existing German models available on Hugging Face have 137M parameters and a context length of 1024 tokens1, which is quite limited compared to recently released …

Categories: Deep Learning

2794 Words, Tagged with: Deep Learning · Generative Models · LLM