Small Models, Big Impact (Part II)

Small Language Models Make Big Waves

Jackson Chen, Tensility Intern and MBA candidate at Northwestern University Kellogg School of Management
Armando Pauker, Managing Director at Tensility Venture Partners
Wayne Boulais, Managing Director at Tensility Venture Partners

Mohamed AlTantawy, CTO & Founder at Agolo

In the first installment of our series, we delved into the fundamentals of Small Language Models (SLMs) and explored their growing importance for enterprise users. As we continue our journey in this second part, we'll dive deeper into the training methods of SLMs and examine the future trajectory of their development.

Training Small Language Models

The training methodologies for Small Language Models (SLMs) are pivotal to their effectiveness and efficiency. Several approaches have emerged, accelerating how these models are developed:

Transfer Learning[1]:

A cornerstone in SLM training is transfer learning. This approach involves initially pre-training a model on a data-rich task, followed by fine-tuning it on a more specific downstream task. As illustrated in the graphs below, Transfer Learning allows SLMs to leverage pre-existing knowledge bases more efficiently compared to traditional machine learning approaches, enabling them to be quickly adapted for particular applications. This method not only speeds up the training process but also enhances the model's performance in specific tasks, making it a practical and resource-efficient approach for SLM development.

Traditional machine learning vs. Transfer learning. Source: “Transfer Learning - Machine Learning's Next Frontier” by Sebastian Ruder, ruder. io

Utilization of Synthetic Data (with cautious consideration)[2][3]:

Another useful component in SLM training is the use of synthetic data, which refers to data that is artificially generated on a computer (typically through statistical methods or AI models) rather than obtained by direct measurement or real-world collection. This approach is particularly beneficial in sensitive areas like healthcare, where it can be used to safeguard patient data while still contributing to the advancement of clinical trials. Applying synthetic data in training SLMs expedites the training process and improves model performance, offering a viable alternative to relying solely on real-world data. However, as we discussed in our previous blog post, "The Double-Edged Sword of Synthetic Data in AI Training" this method isn't without its challenges. While synthetic data is invaluable in addressing gaps in model training, it requires careful application. Injudicious use of synthetic data can inadvertently lead to a decrease in model quality. It's crucial to balance the benefits of synthetic data with a mindful approach to its integration in language model training, ensuring the highest standards of model accuracy and reliability are maintained.

Efficient Pre-Training Techniques[4]:

In machine learning, "pre-training" refers to an initial phase where a model is trained on an extensive dataset, typically containing millions or billions of examples. This dataset is generally broad, covering a wide range of subjects, and its purpose is to equip the model with basic skills in language comprehension or image recognition in order to provide a foundation for the development of an SLM. A significant advancement in this area was Google's development of the "UL2R Training" method. This method, known as UL2 or "Unifying Language Learner," is a groundbreaking approach to pre-training that enhances language model performance across various datasets and scenarios. Google introduced this method in 2022, aiming to boost the efficiency of language models in different contexts and settings.

The next development of UL2 is the introduction of UL2R (UL2 Repair). It is an additional stage of continued pre-training that only requires a relatively small amount of compute. For example, the graph below demonstrates an experiment run by Google. Google demonstrated that applying UL2R on the intermediate checkpoint of PaLM 540B, can reach the performance of the final PaLM 540B checkpoint while using 2x less compute (saving ∼4.4 million TPUv4 hours) compared with that of the original PaLM model. This innovative approach underscores the potential for smaller models to reach high levels of performance without the extensive investment typically required for LLMs. This approach not only showcases the potential of SLMs to reach high performance levels efficiently but also echoes our previous discussion in Part I about the cost and time-saving benefits of SLMs, reinforcing their growing appeal in various sectors.

Compute versus model performance of Google’s PaLM 540B and U-PaLM 540B (which adopted Google’s UL2R training) on 26 NLP benchmarks. Source: “Better Language Models Without Massive Compute”, Jason Wei and Yi Tay, Research Scientists, Google Research, Brain Team

Open-source SLMs

In the dynamic world of language models, open-source communities are drivers of SLM development. Open-source models stand in contrast to closed models, where the latter keeps the source code and training algorithms confidential. Over the past year, the open-source community has witnessed a surge in generative AI projects. On platforms like GitHub, over 8,000 projects have emerged, ranging from well-known models like Meta's Llama 2 to various experimental applications[5].

One significant contribution to this movement is from Mistral AI, a French startup that, after raising $113 million in seed funding in June 2023, released their Mixtral 8x7B model in December 2023. This sparse mixture of experts (SMoE) model has been shown to outperform Llama 2 70B in most benchmarks with six times faster inference and is on par or exceeds GPT-3.5 in many areas[6].

Another model is TinyLlama, released earlier this year after four months of training. This compact model, with 1.1 billion parameters and trained on approximately 1 trillion tokens, is extremely efficient, requiring only 550MB of RAM; it was trained using just 16 A100-40G GPUs. TinyLlama outperforms other open-source SLMs such as OPT-1.3B and Pythia-1.4B[7].

The appeal of SLMs to open-source developers has several advantages. The collaborative nature of open-source development allows for more user-driven input into the development process than traditional closed models. These open-sourced models are also more adaptable for niche applications or edge cases, such as local language support. They can incorporate customized security controls and are capable of running efficiently on local hardware.

From Data Centers to Edge Computing

The rise of SLMs has followed the migration from centralized data centers to edge computing, particularly in consumer smartphones. This migration makes sophisticated AI capabilities more accessible and integrated into everyday technology.

Apple's Research Initiative[8]:

Apple's latest research, titled "LLM in a Flash," underscores its commitment to advancing edge AI. This initiative is focused on developing methods for efficient inference of LLMs on devices with limited memory. This approach could revolutionize

how AI is integrated into Apple's suite of products, making sophisticated AI functionalities available on consumer devices without compromising performance.

Qualcomm's Stable Diffusion Demo[9]:

Qualcomm, a major player in smartphone chip design, has made significant strides by demonstrating the capability of running Stable Diffusion, a deep-learning text-to-image model, directly on smartphones. This achievement, showcased in early 2023, allows the generation of images in under a second, highlighting the potential for real-time AI applications on standard consumer devices.

Google's On-Device Optimization with Gemini Nano[10]:

Google's Gemini Nano, an SLM with 3.25 billion parameters, is designed for on-device tasks. It has been optimized for functions like smart reply in Gboard and summarizing in Recorder, demonstrating Google's commitment to integrating AI into everyday user experiences. Notably, Gemini Nano operates on Google's Pixel 8 Pro, powered by the in-house Google Tensor G3 processor.

Samsung's Gauss Model and the AI Phone Concept[11]:

Samsung is making significant strides with its generative AI model, Gauss. This model is reportedly set to run on the upcoming Galaxy S24 series, which Samsung claims will be the "first AI phone" in the world. Samsung has ambition to lead in the integration of AI at the edge, particularly in the consumer smartphone market.

In Conclusion

SLMs are revolutionizing the AI landscape with their efficiency, customizability, and reduced resource demands, fundamentally changing how enterprises harness AI. These models, adaptable across industries like healthcare, finance, and legal, lower the financial and technological barriers, enabling the creation of powerful, enterprise-specific AI tools. The rapid expansion of the number and type of small language models should lead to a significant leap in the accessibility, variety, and integration of AI applications in mobile and other devices.

Small Language Models Make Big Waves

Training Small Language Models

Transfer Learning[1]:

Utilization of Synthetic Data (with cautious consideration)[2][3]:

Efficient Pre-Training Techniques[4]:

Open-source SLMs

From Data Centers to Edge Computing

Apple's Research Initiative[8]:

Qualcomm's Stable Diffusion Demo[9]:

Google's On-Device Optimization with Gemini Nano[10]:

Samsung's Gauss Model and the AI Phone Concept[11]:

In Conclusion

References

Explore

Small Models, Big Impact (Part II)

Small Language Models Make Big Waves

Training Small Language Models

Transfer Learning[1]:

Utilization of Synthetic Data (with cautious consideration)[2][3]:

Efficient Pre-Training Techniques[4]:

Open-source SLMs

From Data Centers to Edge Computing

Apple's Research Initiative[8]:

Qualcomm's Stable Diffusion Demo[9]:

Google's On-Device Optimization with Gemini Nano[10]:

Samsung's Gauss Model and the AI Phone Concept[11]:

In Conclusion

References

Navigating AI risks: Top data leak threats for enterprises in 2024

Small Models, Big Impact (Part I)

Explore