Memory & context

The 5-Layer Memory Stack for Claude Agents

7 minute readUpdated June 2026Explore more

TL;DR

An agent that forgets everything between calls is just an expensive to-do list. Five memory types fix it, each for a different job: a rolling buffer for the live session, entity memory for facts, episodic memory for history, semantic memory for search, and procedural memory for how-to. Build them in order, starting today.

Why your agent keeps forgetting

You hand an agent a 2,000-word context dump at the top of every conversation and watch it forget all of it the moment the session ends. Next call: start over, same questions, same setup. The fix is not a better prompt - it is memory architecture. Wire in the right memory type for each job and your agents pick up exactly where they left off. There are five, and each builds on the last.

1. Conversation buffer

The running transcript of the current session, passed back into context each turn. Raw buffers get fat fast - a 90-minute call becomes 8,000 tokens before the useful part. The fix is a rolling summary: compress the last N turns into a short structured summary every 10 exchanges, keep the last 3 raw turns for immediate context, and carry the summary forward instead of the wall of text.

Conversation buffer compressionYou are maintaining a conversation buffer. Every 10 turns, compress the
history into:

DECISIONS MADE: [decisions confirmed in this block]
KEY CONTEXT: [2-3 sentences of the most important facts]
OPEN QUESTIONS: [anything unresolved]

Then discard the full transcript for that block and carry only this
summary forward. Always keep the last 3 raw exchanges intact.

2. Entity memory

A dedicated store for named things: people, companies, projects, preferences, decisions. Every time the agent learns something about an entity, it writes it; every time the entity comes up, it reads first. This is the difference between an agent that says 'you mentioned your budget is around 5K' and one that asks the same question for the fourth time. Store records as simple JSON in a file the agent can read and write - no database required.

Entity memory writeAt the end of every conversation, extract new facts about named entities
(people, companies, projects, preferences, constraints) as:

{
  "entity": "[name]",
  "type": "[person | company | project | preference | constraint]",
  "fact": "[one-sentence statement]",
  "confidence": "[high | medium]",
  "source": "[direct statement | inferred]"
}

Save only high or medium confidence facts. Do not save guesses about
emotional states. Append - never overwrite existing facts.

3. Episodic memory

What happened, when, and what was decided - a timestamped event log. Where entity memory holds facts, episodic memory holds the arc of the relationship, so the agent can say 'in our second call we agreed to push the launch, and in the third you said it was a team change.' Give each session a one-paragraph entry and inject it at the start of the next as a short 'previously on' brief.

Episodic memory session logAt the end of this conversation, write a session log entry:

DATE: [today]
SESSION NUMBER: [increment from prior entries]
MAIN TOPIC: [one sentence]
COMMITMENTS MADE: [who committed to what, with any deadline]
SIGNALS: [any shift in tone, urgency, hesitation or enthusiasm]
NEXT SESSION BRIEF: [one sentence that could open the next call naturally]

Append this entry. Never edit prior entries.

4. Semantic memory

Search your own knowledge base by meaning, not keyword. You have 200 past emails, 47 proposals, 31 debrief notes, and you want the agent to answer 'what do clients usually object to on pricing calls?' - a question that matches no single document by keyword. Semantic search finds the relevant excerpts across all of them. One rule that matters most: chunk at the paragraph level. Documents are too big, sentences too small; paragraphs are the sweet spot. You do not need a heavy vector database to start.

5. Procedural memory

The agent knowing how to do things - processes, not facts. This is what turns a smart chatbot into an operator: a library of named workflows it can call on demand, follow step by step, and report back on without you re-explaining. Things like 'run the new-member onboarding checklist' or 'pull the top 3 objections from this sales transcript and draft rebuttals.' The lever is iteration - every time a procedure produces a great result, update the saved version, and the agent gets better without retraining.

Build them in this order

  1. 1Start with the conversation buffer - fix the 'loses the thread' problem first. 20-minute setup, immediate payoff.
  2. 2Add entity memory - a simple client profile file the agent reads at session start and writes at session end. Now it knows your clients by name and history.
  3. 3Build the episodic log - one paragraph per session, written by the agent and reviewed by you. A full relationship timeline in 30 days.
  4. 4Load your procedural library - take your 5 most repeated tasks and write them as named procedures. This is where the biggest time savings come from.
  5. 5Add semantic memory once your document library passes 50 files. Before that, entity and procedural memory cover most cases.

Common questions

  • Do I need all five memory types?

    No - build in order and stop where it pays off. The conversation buffer and entity memory solve the most common problems (losing the thread and asking the same questions twice). Add episodic, procedural and semantic memory as your needs grow.

  • Which memory type should I build first?

    The conversation buffer. It is a 20-minute setup that stops the agent losing the thread mid-session, and it cuts your context load immediately by compressing old turns instead of stuffing the whole transcript into every prompt.

  • Do I need a vector database for semantic memory?

    Not to start. A modest knowledge base runs fine on a lightweight local setup. The rule that matters more than the tool is chunking at the paragraph level - enough context to be useful, small enough to be precise.

  • What is the difference between entity and episodic memory?

    Entity memory stores facts about named things (this client's budget, this project's stack). Episodic memory stores the timeline of what happened and when. Facts versus history - you usually want both.

  • How is procedural memory different from a prompt?

    A prompt is a one-off instruction. Procedural memory is a saved, named workflow the agent reuses on demand and improves over time. It executes the steps the same way every time instead of improvising, which is what makes an agent feel like a reliable operator.

  • How does this relate to the MCP memory server setup?

    The MCP memory server is one way to persist entity and semantic memory across sessions. This guide is the architecture - the five types and when to use each; the memory-server guide is one concrete tool for storing them. They fit together.

Want the full memory stack?

Get the other 1 in the memory stack - free, with 5,000+ builders.

Join the Club