Atwood — The open-weight model landscape

Open-weight models now rival the closed frontier, and you can run them in your own environment. Here’s how to evaluate them, the current leaders, and the licensing fine print — as of mid-2026.

Open-weight models — ones whose weights you can download and run yourself — have gone from "good enough for experiments" to genuinely rivaling the closed frontier. For regulated organizations that's a big deal: an open model can run inside your environment, so sensitive data never leaves at all. This is a fast-moving space, so treat the specifics below as a snapshot as of mid-2026 — the names will shift, the way you evaluate them won't.

Why open models matter

Privacy and control — run it in your own cloud or on-prem; data never touches a third party.
Cost — no per-token API bill; you pay for compute you control.
No lock-in — you're not hostage to one vendor's pricing or deprecation schedule.
Auditability — you can inspect, pin, and reproduce exactly what you ran.

The trade-off is that you operate it: hardware, serving, scaling, and updates become your problem (or your partner's).

How to evaluate one (the durable part)

Ignore the leaderboard horse race and ask about your job: capability on your actual tasks (not a generic benchmark), license (can you use it commercially, with your data?), size and hardware (will it fit your GPUs?), context length, multilingual needs, and agentic/coding strength if that's your use. The "best" model is the smallest one that clears your bar on your work.

The current leaders (mid-2026)

DeepSeek (V4) — a standout for agentic coding, competitive with the closed frontier on real software tasks; permissive MIT license.
Qwen (3.5 / 3.6, Alibaba) — Apache-2.0, strong multilingual coverage, and excellent small dense models you can run cheaply.
GLM (Zhipu) — strong all-rounder with one of the cleanest MIT licenses.
Kimi K2 (Moonshot) — among the top open models overall on neutral indices.
Llama 4 (Meta) — broadest ecosystem and tooling, with variants reaching enormous context windows.
Gemma 3 (Google) — small and RAM-efficient, a good fit for local and memory-constrained deployment.
Mistral (Large / Small, France) — strong European option, with sizes that fit a single consumer GPU when quantized.

A notable shift: labs like DeepSeek, Moonshot, Zhipu, and Alibaba now hold many of the top open-weight positions — the open frontier is no longer US-centric.

The licensing fine print

"Open weight" is not the same as "open source." Some models ship under clean, permissive licenses (Apache-2.0, MIT — Qwen, DeepSeek, GLM); others use a community license with conditions on scale or use (Llama). Before you build on one, read the license for your specific situation — commercial use, your data, and redistribution all matter, and it's a question for someone who'll own the answer.

The right question isn't “open or closed?” It's “which model, open or closed, is the best fit for this task and this data — and can I prove how it's governed?”

How we use them

We route across open and closed models behind one governed gateway. Open models earn their place on the most sensitive workloads — the ones that should never leave your environment — and on cost-sensitive, high-volume steps. Closed frontier models earn theirs where they're still ahead. Picking per task, privately, is the whole point. (For the different model types in that mix, see the field guide.)

Models and benchmarks change monthly; verify current options before committing.

The open-weight model landscape — a mid-2026 snapshot

Why open models matter

How to evaluate one (the durable part)

The current leaders (mid-2026)

The licensing fine print

How we use them