Building Agent Systems: Architecture, Tradeoffs, and Lessons Learned

by George Issac April 25, 2026 9 min read Technology

Contents 6

01 Architecture matters more than the word “agent”
02 The main architecture patterns
03 What good architecture should do
04 The architecture I actually use
05 What worked, what broke, and what I’d improve
06 Final thought

A practical look at how I think agent systems should be structured, how I built mine, what worked, what failed, and what I would improve.

Agent architecture matters more than the word agent. In this post, I walk through the main architecture patterns, explain how I structured my own persistent agent system, and share the biggest lessons I learned about memory, routing, verification, token efficiency, and bounded autonomy.

A lot of the conversation around AI agents is still too loose to be useful.

People talk about “agents” as if the word itself explains the system. It does not. It tells you very little about how the thing is actually built, how reliable it is, how much responsibility it can carry, or where it will fail.

That is why I have become increasingly skeptical of agent talk that stays at the level of demos, labels, or vibes.

Once you actually start building agent systems, the interesting questions get much less glamorous:

where should the system be deterministic?
where should it be non-deterministic?
what should it remember?
what should it forget?
when should it act?
when should it stop?
when should it ask?
when should it escalate?
how many moving parts are actually necessary?

That is what this post is about.

Not just what agent architectures exist in theory, but how I think about them in practice, how I structured my own system, what worked, what did not, and what I would improve.

Architecture matters more than the word “agent”

The term agent gets used for too many different things:

a chatbot with tool calling
a multi-step workflow with an LLM in the middle
a planner that produces task sequences
a multi-agent orchestration layer
a persistent system with memory, routing, tools, and long-lived state

These are not all the same thing.

The architecture matters more than the label because it determines:

what kind of responsibility the system can carry
how predictable it is
how expensive it is
how much supervision it needs
how well it recovers from failure
how safely it can operate
whether it improves over time or just repeats itself more efficiently

A lot of people jump too quickly from “LLM with a tool” to “agent.” I think that skips the real work.

The real work is deciding what kind of system you are actually building.

The main architecture patterns

I think it helps to think of agent architecture as a ladder. Not because every system has to climb all the way up, but because each step introduces new power, new cost, and new failure modes.

1. Chatbot

input
model response
no meaningful action layer
no durable continuity

Useful, sometimes surprisingly capable, but still mostly an interface.

2. Tool loop

This is the first pattern that starts to feel agent-like.

inspect the request
decide whether it needs a tool
call the tool
interpret the result
continue or respond

This pattern is simple, effective, and underrated. For many use cases, it is enough.

3. Planner/executor split

Here the architecture separates planning from execution.

One layer handles decomposition and sequencing. Another handles tool use and actual execution.

This can help with complex tasks, but it can also become heavier than necessary if the planning layer is mostly decorative.

4. Orchestrated multi-agent systems

This is where work gets split across specialized roles:

researcher
planner
writer
coder
reviewer
tester
coordinator

This can be useful, but it is also overused. A lot of multi-agent systems are mostly architecture theater.

If a single well-routed agent can do the job, start there.

5. Persistent agent system

This is where things get more interesting.

A persistent system does not just act across steps. It carries continuity across time.

what should persist?
what belongs in memory?
what should become an operational rule?
how do past decisions affect future behavior?

At this point you stop building a task solver and start building an operating system for behavior.

6. Enterprise hybrid systems

This is where many serious business systems will end up.

Not pure agents. Not pure workflows. Hybrids.

deterministic scaffolding
bounded non-deterministic reasoning
approvals
auditability
integrations
observability
escalation paths

In practice, many enterprise “agents” are really governed orchestration systems. That is not a weakness. It is often exactly right.

A practical ladder of agent architecture patterns, from simple loops to persistent and enterprise systems.

What good architecture should do

If I strip away the hype, I think good agent architecture should do a few things well.

1. Separate deterministic and non-deterministic zones

Reasoning, interpretation, and planning can benefit from non-determinism.

But things like:

approvals
validation
exact file targeting
publishing rules
state changes
side-effect confirmation

should often stay deterministic.

One of the biggest mistakes in agent design is letting the fuzzy parts leak into the parts that require exactness.

2. Treat verification as first-class

A lot of systems are too eager to treat execution as success.

That is wrong.

The architecture should distinguish:

command ran
request returned 200
tool reported success
target state actually changed

Those are not the same thing.

3. Keep memory selective

Memory is not just retention. It is curation.

transient working context
task context
durable preferences
operational rules
long-term memory worth preserving

A system that remembers everything is usually just under-disciplined.

4. Route work by type

Different tasks need different handling.

Research, publishing, coding, reminders, messaging, and system changes should not all go through the same path just because one model can touch all of them.

task type
risk level
determinism requirements
execution environment
verification needs

5. Use complexity only when it earns its keep

Do not add:

multiple agents
extra memory layers
orchestration stages
planning loops
approval flows

unless they solve a real problem.

Architecture should reduce failure and cognitive load, not produce a more photogenic diagram.

The architecture I actually use

The system I have been building is not a raw multi-agent swarm, and it is not a simple single-loop assistant either.

It is closer to a persistent bounded orchestration architecture built around a primary agent identity, selective memory, tool access, routed execution, and delegated specialist paths when needed.

1. A primary persistent agent layer

continuity
a stable operating posture
memory that accumulates coherently
an identity layer that persists across sessions
one main interface for reasoning, planning, and interaction

That makes the system feel less like a random collection of runs and more like an ongoing construct.

2. Multiple memory layers

The system works better when memory is layered:

live state
durable preferences
daily notes
monthly summaries
long-term memory
indexed retrieval

That structure matters because not all memory should behave the same way.

3. Structured persistent memory, not transcript hoarding

This was one of the biggest lessons.

Persistent memory does not mean replaying raw history into the model forever. That burns tokens, slows the system down, and makes retrieval noisier.

What worked much better was a layered memory model with selective retrieval:

daily files for raw chronology
monthly summaries for compressed history
long-term memory for durable truths
indexed search to retrieve only the relevant slice

That shift was not just about neatness. It was about making persistence affordable and useful.

A persistent system cannot drag its whole past into every task. It has to compress, promote, and retrieve intelligently.

4. Search-first retrieval instead of raw loading

Instead of rereading large files by default, the system works better when it:

searches first
pulls only the relevant snippet
reads the smallest useful slice
escalates to raw file loads only when necessary

That did a few things at once:

cut token burn dramatically
improved retrieval quality
reduced irrelevant context pollution
made memory feel like a usable system instead of a dump

This sounds like a small optimization, but it is not. In a persistent architecture, memory design is also token design.

Persistent memory only works when retrieval is selective, layered, and cheap enough to use.

5. Routing before action

Not every prompt should be handled the same way. Some need:

memory lookup
operational files
external tools
constrained workflows
delegated coding paths
publishing logic

So instead of treating every message as “just answer,” the system works better when it decides what kind of request it is handling before acting.

6. Tool access with boundaries

The architecture works best when tool access is grouped and bounded:

file reads/writes
web research
messaging
image generation
publishing
shell execution
subagent delegation

And more importantly, the system should know:

which tools are safe for internal work
which create side effects
which need approval
which demand verification

7. Delegated specialist paths

Delegation is most useful when:

the task is long-running
it needs a different environment
it benefits from specialist handling, like coding or structured research

So the architecture includes specialist paths, but they are support structures, not the center of the system.

8. Approval and verification layer

For any system that can:

publish
message externally
modify files
run commands
call real APIs
create side effects

you need a clear distinction between:

safe internal action
external action
sensitive action
reversible action
high-risk action

And you need verification after execution, not just before it.

The actual technical stack: OpenClaw runtime, layered memory, routing, tools, delegation, approvals, and verification.

What worked, what broke, and what I’d improve

What worked

Persistence really does matter. It reduces rediscovery and makes the system feel cumulative instead of episodic.
Layered memory works better than flat memory. Relevance and maintainability both improved.
Search-first retrieval saves both tokens and attention. This was one of the clearest wins.
Simple routed structure beats flashy swarms. A primary agent with good routing and bounded delegation often works better than a noisier multi-agent setup.
Verification matters more than elegance. A less elegant system that checks outcomes is more useful than a prettier one that trusts success signals too easily.

What did not work as well

Writing lessons down was not enough. Without enforcement, the system could still repeat the same categories of mistake.
Loose workflows created repeat errors. Publishing and other side-effect domains need canonical paths.
Persistence increased the design burden. Memory hygiene, file discipline, and state coherence all became ongoing responsibilities.
Bad retrieval discipline made persistence worse. If too much context gets loaded, continuity turns into clutter.

What I would improve

stronger preflight enforcement through checklists and runbooks
cleaner task and state tracking
better memory promotion rules from daily notes to monthly summaries to durable memory
better observability into why routing or tool decisions happened
tighter retrieval discipline so raw context gets loaded even less often
more canonical domain workflows for publishing, messaging, and similar side-effect tasks

Final thought

The best agent architecture is not the most complex one.

It is the one that gives the system:

enough autonomy to be useful
enough structure to be reliable
enough continuity to improve over time
enough memory discipline to stay efficient
enough boundaries to stay sane

That is the tradeoff.

Not maximum freedom.
Not maximum cleverness.
Not maximum number of agents.
Not maximum theatrics.

Just enough architecture for the system to carry real responsibility without pretending to be more than it is.

That is what I care about now.

And I think that is where agent systems start becoming genuinely valuable.