Producing Large Language Models « Machine Learning Times

Originally posted on Replit.comSeptember 21, 2022.

Large Language Models (LLMs) are known for their near-magical ability to learn from very few examples – as few as zero – to create linguistic wonders. LLMs can chat, write poetry, write code, and even do arithmetic. However, the same properties that make LLMs magical also make them challenging from an engineering perspective.

At Replit, we’ve deployed language models based on transformers of all sizes: ~100 million parameter models for search and spam, 1-10B models for a code completion product we call ghost writer, and 100B+ models for features that require more thinking skills. In this article, we’ll talk about what we’ve learned about creating and hosting large language models.

Absurdity

Any sufficiently advanced bullshit is indistinguishable from intelligence, or so the LLM thought. LLMs are super suggestible – in fact, the primary way to interact with LLMs is through “inciting”. Basically, you give the LLM a string of text and it generates a response, mostly text, although some models can also generate audio or even images. The problem is that you can invite the LLM with nonsense and it will generate nonsense. Trash inside, trash outside. Additionally, LLMs tend to get stuck in loops, repeating the same thing over and over again, because they have a limited attention span when dealing with new scenarios that weren’t present during training.

To continue reading this article, Click here.

Comments are closed.