Small Models, Big Impact (Part I)

Jan 25

The Emergence of Small Language Models

Jackson Chen, Tensility Intern and MBA candidate at Northwestern University Kellogg School of Management

Armando Pauker, Managing Director at Tensility Venture Partners

Wayne Boulais, Managing Director at Tensility Venture Partners

Mohamed AlTantawy, CTO & Founder at Agolo

In November 2022, OpenAI introduced ChatGPT, marking a significant milestone in the development of Large Language Models (LLMs). Since then, more investment has led to bigger and better models like GPT-4, Gemini Ultra, and Claude. This innovation has ushered in a new era in artificial intelligence (AI). These models, characterized by their staggering complexity of hundreds of billion parameters, deliver impressive results for both novice and expert users. Their ability to understand and generate diverse content, including text, images, and voices, is reshaping the possibilities of AI applications. From coding assistance and answering complex queries to summarizing contexts and performing quantitative analyses, the capabilities of these LLMs are expansive and groundbreaking.

As Large Language Models (LLMs) continue to advance, a new wave of innovation is emerging with Small Language Models (SLMs). These SLMs are challenging the prevailing belief that "bigger is better" in the AI world. In recent months, these more compact models have demonstrated capabilities that rival, and in some cases, surpass those of their larger counterparts in specific tasks, but first let’s understand Small Language Models.

Defining Small Langage Models

A Small Language Model (SLM) can be thought of as a more streamlined version of larger generative AI models, trained on a cleaner or more specialized dataset and utilizing fewer parameters for decision-making. In the research context, "small" often refers to models with less than 100 million parameters, and sometimes as few as 1-10 million[1]. However, this definition is evolving. Recent developments have seen the introduction of SLMs that boast over a billion parameters. For instance, in November 2023, Microsoft unveiled "Phi-2," a transformer-based SLM with 2.7 billion parameters[2], which delivers performance comparable to Meta's Llama-2, a 70-billion-parameter model. Google has also stepped into the arena with its lightweight models, Gemini Nano-1 and Nano-2, trained on 1.8 billion and 3.25 billion parameters, respectively[3].

Advantages of Small Language Models

The recent shift towards SLMs is not just a trend but a strategic move driven by several compelling advantages. These benefits position SLMs as increasingly preferable alternatives to Large Language Models for certain enterprise applications.

1. Higher Efficiency Across Cost and Training Speed:

One of the major limitations of LLMs is their demanding computational requirements. Training an LLM can incur substantial costs, from acquiring computational infrastructure to the significant time investment required. This can lead to operational inefficiencies for enterprises. In contrast, SLMs are more resource-efficient, capable of running on less powerful GPUs, and can complete training and do inference in a shorter time frame. For example, the training phase of Microsoft's Phi-2 only took 14 days on 96 A100 GPUs, whereas training GPT-3 (175 billion parameters) required over a month on 1024 A100 GPUs. This efficiency can lead to quality results at a lower operating cost.

2. Better Customizability:

While LLMs excel in providing a broad foundation for content generation and question-answering, SLMs offer opportunities for customization for those enterprises focused on developing domain-specific AI solutions. By training SLMs on proprietary or industry-specific datasets, companies can create models that are customized to their

unique terminologies, regulatory requirements, and user needs. For example, in the financial sector, an SLM could be specialized for generating market reports, creating model portfolios, and interpreting regulatory documents, while in the legal industry, a model might be trained for a deeper understanding of legal texts and case analysis[4].

3. Strengthened Security:

SLMs also offer improved security and data control features for enterprise users. The models can process data locally or within controlled boundaries, reducing the risk of data leakage outside the organization. Since SLMs are typically trained on specific datasets, enterprises have better control over the quality and the kind of data input, minimizing the risk of biased or malicious data influencing the model. For instance, when Microsoft trained their Phi-2 model, they meticulously selected "textbook-quality" data to ensure optimal performance. Additionally, the smaller codebase and fewer parameters of SLMs could lower the risk of security breaches and data poisoning compared to LLMs[5].

4. Enhanced Sustainability[6]:

The shift to Small Language Models could increase AI sustainability, primarily due to their reduced computational requirements leading to a lower carbon footprint. A non-peer-reviewed study by Hugging Face highlights this by comparing the carbon emissions of a 176B parameter LLM (similar in size to GPT-3's 175B parameters) to substantial real-world impacts. The carbon emissions from using such a large model could equate to at least 50 metric tons of CO2 - roughly equivalent to the emissions from 60 flights between London and New York. In contrast, SLMs’ compact size and optimized efficiency result in significantly less energy usage, heralding a vital move towards more eco-friendly AI technologies.

Conclusion

As we conclude the first part of our exploration of SLMs, it's clear that smaller models are answering the industry's call for AI solutions that are not only efficient and versatile but also widely adaptable across various industries. These models are setting a new precedent by delivering high performance without the need for heavy resource investment.

Our sequel SLM blog post will focus on the implementation considerations including relevant techniques as well as the future development trends of SLMs.

References

Armando Pauker