The Insatiable Hunger for Compute: Powering Large Language Models
Introduction In this day and age, language models are ubiquitous. We constantly interact with them even without knowing. For example, applications such as, next word predictions while typing on our smartphone keyboard, suggestions while writing an email, or simply converting text-to-speech, use language models in some form or the other. Furthermore, new research is constantly being published at an unprecedented rate. But recently, these language models are getting big, and are now being referred to as L arge L anguage M odels or LLMs. The size of these LLMs are in the order of billions (number of parameters) and keeps growing larger. They also require colossal datasets as training datasets, to produce meaningful results. In fact, one of the most used LLM nowadays, ChatGPT, was trained on approximately 570GB of data (which is surprisingly not large at all!) [2]. Given that the Internet has existed for a few decades now, it is not difficult to find data of such scale. The point to highl