A field guide to AI model types — and when to use each
“AI model” isn’t one thing. There’s a whole toolbox — language, reasoning, small, multimodal, embedding, image, and speech models — and good systems use several together.
People say "AI model" as if it means one thing. It doesn't. There's a whole toolbox of model types, each good at a different job, and the systems that work well use several of them together — the right tool at each step. Here's the field guide.
Large language models (LLMs)
The general-purpose workhorse: generate and transform text, follow instructions, summarize, draft, answer. When people say "AI" today they usually mean this. Use it for: drafting, summarizing, classification with reasoning, conversation, and orchestrating other tools.
Reasoning models
LLMs tuned to "think" before answering — spending extra compute on a chain of internal steps for hard problems. Slower and more expensive, but markedly better at math, complex code, and multi-step planning. Use it for: the genuinely hard analytical step — and not for "summarize this email," where you'd pay for thinking you don't need. The skill is knowing which steps deserve a reasoning model.
Small language models (SLMs)
Compact models that are fast, cheap, and can run on modest hardware or on-device. They won't write your strategy memo, but they're excellent at high-volume, well-defined tasks. Use it for: classification, routing, extraction, and first-pass triage in front of a bigger model. Example: an SLM decides which of fifty incoming documents needs the expensive model at all.
Multimodal models
Models that take more than text — images, PDFs, charts, sometimes audio or video — and reason across them. Use it for: reading a scanned contract, interpreting a chart, extracting data from a form, describing an image. Example: "pull the totals from this PDF invoice" is a multimodal job.
Embedding models
These don't generate text — they turn text (or images) into vectors that capture meaning, so you can search by similarity. They're the quiet engine behind retrieval. Use it for: semantic search and RAG — finding the right documents to ground an answer. Example: matching a member's question to the three most relevant policy passages.
Image, speech, and specialized models
Image generation (diffusion) models create visuals from text. Speech models handle transcription (speech-to-text) and synthesis (text-to-speech). Code-specialized models are LLMs tuned hard for programming. Each is the right tool for its narrow job and overkill or useless outside it.
The point: real systems compose them
A serious workflow isn't one model — it's a pipeline. Example: an SLM classifies an incoming document, a multimodal model reads it, an embedding model retrieves the relevant policy, an LLM drafts the response, and a reasoning model checks the tricky calculation. A coordinator routes each step to the cheapest model that can do it well.
Asking "which AI model is best" is like asking "which tool is best." Best at what? The win isn't one model — it's the right model at each step, composed behind one governed path.
That composition — many model types, one governed gateway — is how you get quality and cost-efficiency without your data ever leaving its boundary.