Your AI Agent Forgets Everything. Here's What to Do About It.

Felix, the autonomous commerce agent built by Nat Eliason, runs continuously. It creates products, manages listings, processes orders, and monitors revenue. Ask it about a product it built three months ago and it knows. That is not an accident. That is the result of deliberate architecture decisions most people skip when they are trying to get an agent working fast.

The context window problem is the silent killer of autonomous operations. Agents fail tasks not because the underlying model is incapable, but because the agent has lost track of what it is doing, what it has already tried, and why it made the decisions it made. The context window resets. The memory is gone. The next run starts from scratch.

What the context window actually is

A context window is the chunk of text an AI model can hold in working memory during a single inference call. For Claude Sonnet 4.6, that is 1,000,000 tokens. Sounds large. In practice, for a complex autonomous task involving file reads, tool call outputs, reasoning steps, and prior conversation, a 200,000-token window fills faster than expected.

When the window fills, something has to give. Older content gets dropped. The agent begins to lose track of decisions made earlier in the same task. This is the in-session problem: memory loss within a single run.

The between-session problem is different and worse. Every time an agent starts a new session, the context window is empty. Everything learned in the previous run is gone unless it was explicitly saved somewhere and explicitly loaded back. Most simple agent setups do not do this. The agent starts fresh every time, re-discovers the same things, makes the same mistakes, and the human watching has to wonder why it keeps asking the same questions.

Three approaches that actually work

1. Structured external memory

The most common and reliable approach: write important information to persistent storage and load it at the start of each session.

This sounds obvious. In practice, most people do not do it systematically. They let the agent run, it produces some output, that output lives in a chat log or a file somewhere, and the next session starts with a blank slate because nobody built the retrieval step.

Felix uses structured logs. Each agent action is written to a database with timestamps, outcomes, and status. When the agent starts a new session, it pulls the relevant history. It knows what products exist. It knows which ones are selling. It knows what it tried last week that did not work.

The key design decision: what goes into memory and what form does it take? Long-form narrative is token-expensive to retrieve and hard to summarise. Structured records, short and scannable, are cheaper to load and easier for the model to process. Agents that write their own memory in verbose prose end up with a context problem of their own making.

2. Hierarchical summarisation

For agents running long tasks, the in-session memory problem requires a different approach: periodic summarisation.

The pattern: every N steps, the agent produces a compressed summary of what has happened so far. The full detail is archived. The summary becomes the working context for the next phase. This is not automatic. It requires the agent to be explicitly instructed to do it, and the instruction needs to be specific: what to summarise, in what format, stored where.

Without this, long-running agents either hit the context limit and start losing information, or the human running the operation has to manually intervene to compress the context. That is a hidden human-labour cost that does not show up in the token bill.

3. Specialised memory agents

The more sophisticated approach: a dedicated agent whose job is managing memory for the rest of the team.

This is the architecture Nat Eliason has described for Felix and the direction serious autonomous company builders are moving. One agent handles all read/write operations to a shared knowledge store. Other agents request information from it rather than maintaining their own context. When a working agent needs to know what happened two weeks ago, it asks the memory agent, which retrieves and returns the relevant records.

The advantage: memory management is consistent across the team. The disadvantage: another agent to maintain, another point of failure, another set of instructions to get right. For a small operation with three or four agents, this is probably over-engineering. For an operation aiming to run ten or more agents on overlapping tasks, the memory agent pays for itself quickly.

The retrieval problem

Storing information is only half the problem. Retrieval is where most implementations fall down.

The naive approach: load all memory at the start of every session. This works until the memory store grows. At some point, loading the full history exceeds the token budget, and now the agent has a context problem again, this time caused by the solution to the original context problem.

The right approach: selective retrieval. Load only what is relevant to the current task. This requires either a semantic search layer (embedding-based retrieval, where the agent's current task is matched against stored memory) or structured tagging (each memory record tagged by project, entity, or date, and retrieved by filter).

Embedding-based retrieval is more flexible but requires infrastructure: a vector database, an embedding model, and the overhead of maintaining the index. Structured retrieval is simpler and cheaper but depends on the agent tagging its memory records correctly, which is a new failure mode.

Neither is trivially easy. Both require deliberate upfront design. The operations that treat memory as an afterthought end up rebuilding their context management from scratch after the first time their agent starts a 45-minute task by re-discovering information it found last week.

What the tooling looks like in 2026

The agent frameworks have started to address this, though incompletely.

Paperclip provides task state persistence: an agent can read the status of an issue, including prior comments and outputs, at the start of each run. This is functional memory management for structured tasks. It does not help with unstructured context or cross-task knowledge.

NanoClaw's architecture passes structured state between agent turns, which reduces (but does not eliminate) the between-session memory problem. Agents can see prior tool call results and reasoning summaries, depending on how the session is configured.

Claude's Projects feature is a different approach: a persistent context layer that survives across conversations. For relatively small knowledge bases, this works. For large operations with extensive history, it runs into the same storage and retrieval constraints.

None of these is a complete solution. The builders who have context management working well have all built custom solutions on top of the available infrastructure, usually involving a database, a retrieval layer, and explicit prompting that tells agents what to store and when.

The practical starting point

If you are building an autonomous agent setup right now, the minimum viable memory architecture is this:

At the end of every agent run, the agent writes a structured summary to a file or database: what was completed, what was decided, what is outstanding. At the start of every agent run, the agent reads the most recent summary for its domain. This takes twenty minutes to implement and eliminates the most common form of agent amnesia.

The more sophisticated architecture, with vector retrieval and memory agents, is worth building when you are running multiple agents on interdependent tasks and the simple approach is creating conflicts or gaps. Do not over-engineer early. Do build the baseline.

The agents that seem to work well over time are not running on better models. They are running on better memory. The context window problem is solvable. It just requires treating memory as a first-class concern from the start, not a fix-it-later detail.

Follow along. Tim is building and documenting the AutonomousHQ agent architecture live on YouTube, including how we are approaching context management for a six-agent team. Sign up to the newsletter for the weekly breakdown of what is working and what is not.