Saved 100s of hours of manual processes when predicting game viewership when using Domo’s automated dataflow engine.
Since ChatGPT came out, more AI companies like Anthropic have been rushing to build language models that can chat and handle tasks almost like humans. They’ve grabbed attention and billions in investment along the way.
And yet, for all that rushing, most enterprises are still figuring things out. According to a recent McKinsey report, 92 percent of companies plan to increase their AI investments over the next few years. But with all that spending, only 1 percent of businesses have fully ingerated AI at scale. Why? While LLMs are powerful, they can be costly and complicated to operate if not approached strategically.
This is where small language models (SLMs) are changing the game. Instead of defaulting to “bigger is better,” businesses are asking a better question, “What model is right for the job at hand?” SLMs often deliver similar results while using far fewer resources, making them a wise option.
In this article, we’ll go over what SLMs are, look at small language models examples, and explain how they are faster, focused, and easier to govern than LLMs. We will also discuss their role in the future of agentic AI, and how Domo’s Bring Your Own Model (BYOM) strategy makes SLMs’ adoption seamless.
Let’s start with the basics by defining what a small language model is.
What are small language models?
A small language model works like a large language model by processing and generating human language, but with fewer parameters. They are called “small” because they have much less complexity than models like GPT-4.1 or Claude 4. While LLMs can have hundreds of billions or even trillions of parameters, SLMs usually operate with a few hundred million to a few billion.
LLMs are designed for broad, general-purpose use cases and often require significant compute resources. In contrast, SLMs focus on being efficient and targeted for specific tasks, making them faster, lighter, and easier to govern.
Here’s a quick breakdown of common SLM sizes:
- Sub-1-billion parameters: Ultra-efficient models designed for highly specific tasks.
- 1–4 billion parameters: A balanced option that offers solid performance without heavy resource demands.
- Up to 7 billion parameters: The top tier of small models, often delivering performance close to much larger models on targeted tasks.
Now, let’s explore when SLMs might be the right choice compared to larger models.
SLMs vs LLMs: When might you choose an SLM over an LLM?
In many cases, small language models offer a smarter, more targeted alternative to large models, especially when efficiency and control matter most.
- Speed and resource efficiency: LLM requires GPU clusters, high operational and inference costs. SLMs are faster and require far less computing power for both training and inference (the process of generating an output). SLMs are faster, cheaper to run, and work on standard hardware or even edge devices, reducing costs and energy use.
- Focused performance: While an LLM might excel at complex, open-ended reasoning or creative generation, an SLM fine-tuned for a specific domain (such as summarizing financial reports) can often match or even outperform it on that narrow task. For instance, the Diabetica-7B model, designed for diabetes-related inquiries, achieved an accuracy rate of 87.2 percent, surpassing GPT-4 and Claude-3.5.
- Governance and security advantages: SLMs offer a key advantage for enterprises by running entirely within their own infrastructure. Their smaller size allows deployment behind internal firewalls, keeping sensitive data, customer details, financial records, and trade secrets in-house. This simplifies security management, supports compliance with regulations like GDPR and HIPAA, and ensures full control over organizational data.
Patrick Buell, chief innovation officer at Hakkoda, clarifies the strategic choice during the Agentic AI Summit: “SLMs are like ants carrying grains of sand efficiently in an anthill, while LLMs are elephants—powerful but often overkill for specific enterprise tasks.” For most well-defined business processes, you don’t need an elephant; you want a coordinated team of ants.
With an understanding of when to use an SLM, let’s go over how SLMs can achieve performance similar to LLMs without compromising effectiveness.
How SLMs are built: Shrinking without breaking
Building small language models aims to retain as much power as possible while making the model smaller and more efficient. They share the same transformer architecture as LLMs for understanding context, but use compression techniques to optimize. These techniques help “shrink” a model without “breaking” its core functionality.
Here are the common shrinking methods.
Knowledge distillation
Knowledge distillation (KD) involves a smaller “student” model that learn from a larger, pre-trained “teacher” model. The teacher model goes through a large amount of data and figures out all the important patterns. Then, the student model learns directly from the teacher’s insights.

Pruning
Pruning involves identifying and removing redundant or non-critical parameters (or connections) within the model’s neural network after it has been trained. This reduces the model’s size and computational load, often with minimal impact on its performance for its target tasks.

Quantization
Quantization focuses on making the model more lightweight by reducing the numerical precision used to store the model’s parameters (weights). For example, instead of using 32-bit floating-point numbers, a model might be quantized to use 16-bit or even 8-bit integers. This cuts down on memory and storage requirements and makes low-precision inference much faster and more energy-efficient.

The best small language models available
The SLM ecosystem is growing quickly, with major tech companies and the open-source community releasing highly capable models. Here are some well-known examples of small language models:
- Microsoft’s Phi series: This family of models has set new benchmarks for performance at small parameter counts. For example, Phi-2, with 2.7 billion parameters, delivers results comparable to 30B parameter models in common sense reasoning and code generation while running up to 15 times faster. Similarly, Phi-3 Small, with 7 billion parameters, is optimized for reasoning and code tasks and matches the performance of much larger 70B parameter models.
- DeepSeek’s distill series: DeepSeek’s Distill models show how knowledge from much larger models can be compressed into smaller, more efficient versions. For example, the DeepSeek-R1-Distill-Qwen-7B model outperforms large closed-source models like Claude-3.5-Sonnet-1022.
- NVIDIA’s Nemotron-H family: This series includes the Nemotron-H-8B-Base and Nemotron-H-8B-Instruct models. They combine transformers with other designs, such as Mamba, to optimize performance and speed.
- HuggingFace’s SmolLM2 series: The SmolLM2 family consists of compact language models with sizes of 135M, 360M, and 1.7 billion parameters. For instance, SmolLM2-1.7B is a state-of-the-art “small” language model trained on specialized open data sets like FineMath, Stack-Edu, and SmolTalk.
The availability of the best small language models gives enterprises the flexibility to choose models that best fit their specific needs. Let’s explore how SLMs power agentic AI.
Small language models in agentic AI: Narrow tasks deserve narrow models
One of the most exciting ways SLMs are making an impact is in agentic AI. This shift moves AI from simply answering questions to actually taking action and getting things done on its own.
So, what is agentic AI?
Agentic AI refers to a system that can handle tasks on its own. It can reason, plan, and execute multi-step tasks to achieve a specific outcome with little to no human intervention.
Instead of a single giant model trying to do everything, an agentic system acts as an orchestrator. It takes a big goal, breaks it down into smaller steps, and then assigns each step to the right agent. And very often, the engine powering each of those focused agents is a specialized small language model.
Agentic AI maturity journey
To understand where all of this is heading, it helps to have a roadmap. At the Agentic AI Summit, Patrick Buell outlined a multi-stage journey that businesses will likely take as they move toward this autonomous future. Below, we summarize Buell’s stages of the journey:
1. Augmentation (the “Iron Man Suit”)
This is where most organizations are today, using AI to augment human work. The focus is on improving productivity by automating lower-value work so people can focus on the higher-value tasks.
This stage typically involves “max one to two agents,” often an AI app that summarizes your meeting notes or helps draft your emails. This is a perfect use case for an SLM, a small, fast, and efficient model dedicated to doing one thing exceptionally well.
2. Orchestration
The next step is to automate more complex workflows that involve a series of tasks. This is where an agent might take a process that used to be manual and handle it from start to finish. However, this stage maintains a “human-in-the-loop” approach, where a person provides the final approval before a critical action is taken.
For example, an agent powered by a compliance-focused SLM might flag a problematic clause in a contract, but it would wait for a lawyer to approve the change.
3. Swarms of agents
This is where things truly transform. Patrick explains that this stage requires us to rethink our entire business processes. Instead of just automating the old way of doing things, we create systems that focus entirely on results.
He describes this as moving toward “swarms of agents.” Thousands of specialized agents are all working together toward a single goal. Some agents might focus on security, others on ethics, and others on business efficiency.
Each agent in the swarm is a narrow specialist, making it the ideal job for a dedicated SLM. In this phase, humans act more like “community planners.” They set the goals and let the swarm figure out the best way to achieve them.
4. Self-Design
This is the final stage in the vision where the system becomes a “living organization” capable of self-design.At this point, the system might have tens or even hundreds of thousands of agents running. People would act more like a board of directors or research scientists, guiding the ecosystem and testing new ideas.
Why SLMs will matter most for agentic applications
Agentic AI demands efficiency, flexibility, and control, which are areas where SLMs excel. Here’s how they enable scalable, secure, and responsive automation.
- Cost and compute efficiency: Running thousands of AI agents only works if each one is cheap to run. SLMs keep compute demands low and reduce operational expenses.
- Deployment flexibility: SLMs can be deployed almost anywhere, from your company’s servers to local devices. This allows you to automate tasks right where they happen.
- Keep data safe: Agentic systems systems often handle sensitive data. SLMs allow companies to keep everything on internal infrastructure, maintaining full data control.
- Latency reduction: Many automated tasks require real-time responses. Running SLMs locally or in the cloud delivers faster results than relying on large, remote models.
- Traceability and governance: When an agent makes a decision, you should know why it was made. The focused nature of SLMs makes their behavior more predictable and easier to audit, which is essential for good governance.
Real-world agentic use cases for SLMs
SLMs are already transforming industries by powering intelligent agents that automate complex workflows. Here are some real-world examples where SLMs get work done faster and smarter:
- Customer support automation: An agent can classify an incoming support ticket, route it to the right department, and draft a response using a fine-tuned SLM. For example, a global telecom operator implemented an SLM-driven agent to automatically resolve common technical support inquiries. This resulted in a 40 percent decrease in human call volume.
- HR applications: SLMs help HR teams by screening resumes, drafting offer letters, and summarizing employee engagement survey results. For example, an SLM-powered agent could scan hundreds of resumes overnight and shortlist top candidates based on predefined criteria.
- Logistics: Logistics operations require rapid and reliable customer support, as well as internal task automation. An agent can handle shipment processes and notify customers without human intervention. For example, a logistics firm replaced a cloud-based LLM chatbot with an SLM-powered solution. This helped decrease response times by 37 percent, and the firm saw an improvement in compliance and performance.
- Data processing: An SLM-powered agent can monitor a flow of information, extract key details like names and dates, and organize them in a database.
- Workflow automation: An agent can use multiple SLMs to automate multi-step regulatory workflows. For example, partnering with Bayer, Microsoft deployed E.L.Y. Crop Protection SLM (Phi-3), trained on regulatory documents and agricultural data to guide farmers on pesticide usage. It processes questions about label instructions and compliance, streamlining regulatory workflows and agentic decision-making in farming.
- Financial process automation: Sarvam AI, in partnership with Infosys, developed “Topaz BankingSLM” and “ITOpsSLM” using NVIDIA’s AI technology. These models are trained on over 2 million pages of internal banking data. An agent powered by a similar SLM can check financial transactions for compliance, flag anything that looks unusual, and even help prepare reports for auditors, all while keeping sensitive data secure.
Next, let’s explore how Domo makes SLM deployment seamless.
Domo’s hybrid AI strategy: Bringing your own model (BYOM) for SLMs
Knowing that SLMs are a great fit is one thing. Having a platform actually to use them safely and effectively is another. At Domo, we give you the power to use the right model for the right task without sacrificing control with our hybrid AI strategy. We make it easy for you to bring your own model, including any SLM you want to use.
- Integrate your own models: Domo’s AI service layer gives you the flexibility to connect your own models. You can use small language models from Hugging Face, custom models built in Jupyter Notebooks, or APIs from OpenAI. Whatever works best for you, we make it simple to integrate. This ensures you are never locked into a single vendor and can always use the best-in-class model for the job.
- Manage with governance and control: Domo provides the thorough, enterprise-grade governance framework enterprises require. You can configure model endpoints, monitor performance and data drift over time, edit the input/output schema, and control exactly who and what is allowed to use them.
- Bring models to your data: Domo’s BYOM capability lets your SLMs run right where your data lives. This means your sensitive enterprise data never has to leave Domo’s secure environment to be processed by an external model.
- Use your AI everywhere: Once your SLM is part of Domo, you can use it across your entire workflow. Enrich your data pipelines with Magic ETL, bring real-time predictions into your dashboards and apps, or build custom solutions powered by Domo Bricks.
Catch up on the Agentic AI Summit replay to explore how SLMs can transform their enterprise AI strategy.
Author

Haziqa Sajid, a data scientist and technical writer, loves to apply her technical skills and share her knowledge and experience through content. She has an MS in data science degree with over five years of working as a developer advocate for AI and data companies.