The Well-Architected lens, applied to AI systems
The cloud Well-Architected frameworks predate the AI boom — but their pillars map cleanly onto what a governed AI system actually needs.
AWS, Azure, and Google each publish a "Well-Architected" framework — a set of pillars for building cloud systems that hold up in production. They were written for general workloads, but they translate almost directly to AI systems, with one addition. Here's the lens, pillar by pillar.
Security
For an AI system, security is the governed gateway: authentication, PII stripped at the boundary, scoped tools, egress filtering, and audit on every request. This is also where the OWASP LLM Top 10 lives. Example: least-privilege tools so an agent can draft a payment but not issue one.
Reliability
Long-horizon agents fail mid-task; reliability is whether they recover. Checkpointing, retries, and graceful degradation turn "the run died" into "the run resumed." Example: a multi-day reporting cycle that survives a flaky API instead of starting over.
Performance efficiency
For AI, performance is largely perceived latency. Stream output, acknowledge instantly, and route easy steps to fast models — the Doherty threshold in architectural form. Example: token-by-token streaming instead of an eight-second blank spinner.
Cost optimization
The biggest AI cost lever is not forcing one premium model to do everything. Route by capability — a cheap tier for classification, a reasoning tier for the hard parts — behind a multi-model gateway. Example: classifying a document stack on a fast model and reserving the expensive one for the analysis.
Operational excellence
You can't run what you can't see. Observability, logging, and a clear ownership model are what make an AI system operable rather than a black box that someone babysits. Example: an audit trail that's also your debugging surface when an output looks wrong.
The pillar the cloud frameworks are missing
AI systems need one more concern the originals don't name: governance and responsible AI — impact assessment, human oversight, and provability, the domain of ISO 42001 and NIST AI RMF. Treat it as a first-class pillar, not a footnote under security.
Well-Architected didn't anticipate agents — but its pillars describe a governed AI system almost perfectly, once you add responsible-AI as a pillar of its own.
We build to these lenses on the platforms they came from — Cloudflare, AWS, Azure — so the systems we hand over are production-grade by construction, not just demo-grade.