Atwood — RAG, fine-tuning, or better prompts? Choosing the right lever

There are three ways to make a model fit your work — context, retrieval, and fine-tuning. Most teams reach for the expensive one first. Here’s when each is the right call.

When a model isn't giving you what you need, there are three levers to pull: improve the prompt and context, add retrieval (RAG), or fine-tune. They get discussed as competitors. They're not — they solve different problems, and reaching for the wrong one is where time and money disappear.

Lever 1 — Prompting and context (start here)

Most "the model is wrong" problems are really "the model wasn't told enough." Clear instructions, the right context, examples of what good looks like, and constraints on the output fix a surprising share of issues — at zero training cost and with instant iteration. Example: a summary that keeps dropping key figures usually needs "always preserve every dollar amount and date," not a fine-tune. Exhaust this lever first, every time.

Lever 2 — RAG / retrieval (to ground answers in your data)

When the model needs to answer from your documents, policies, or live data — not from its training — you need retrieval. RAG fetches the relevant facts at query time and grounds the answer in them, with provenance. This is the right lever for the large majority of enterprise use cases, because the problem is usually knowledge, not behavior. Example: "what does our travel policy say about international flights" needs your current policy retrieved and cited — not a model trained on last year's handbook. Retrieving from governed sources also keeps the data private.

Lever 3 — Fine-tuning (for consistent behavior at scale)

Fine-tuning changes how the model behaves — a consistent format, tone, or specialized task pattern — by training on examples. It's powerful but expensive: you need a quality dataset, a training and eval pipeline, and you redo it as things change. Reach for it when you need the same behavior thousands of times and prompting can't hold it. Example: classifying support tickets into your exact taxonomy at high volume with consistent labels is a fine-tuning job. Drafting one board packet is not.

The common mistake

Teams reach for fine-tuning when they need RAG. They want the model to "know our data," assume that means training, and sign up for an expensive pipeline — when retrieval would have grounded the answers faster, cheaper, and with citations. The rule of thumb: if the problem is knowledge, use RAG; if the problem is behavior, consider fine-tuning; and try prompting before either.

Prompting changes what the model is told right now. RAG changes what it can look up. Fine-tuning changes how it behaves. Match the lever to the problem and most “we need to fine-tune” projects simply disappear.

The best systems use all three: strong prompting, RAG over your governed data, and fine-tuning only where it earns its keep — composed behind one governed, multi-model gateway.