Memory Systems

How AI agents maintain context and remember information across interactions.

What are Agent Memory Systems?

Memory systems allow agents to retain and recall information beyond the immediate context window. They enable agents to learn from past interactions and maintain coherent long-term behavior.

Types of Memory

Agent memory systems typically combine multiple memory types for different purposes.

Short-Term Memory

The current conversation context. Limited by context window size.

Long-Term Memory

Persistent storage of past interactions, facts, and learned preferences.

Episodic Memory

Specific past events and interactions that can be recalled.

Semantic Memory

General knowledge and facts extracted from experiences.

Implementation Approaches

Various techniques for implementing agent memory.

Vector Stores

Store embeddings of past interactions for semantic retrieval.

Conversation Summaries

Periodically summarize long conversations to preserve key information.

Key-Value Stores

Store explicit facts and user preferences for direct lookup.

Hybrid Memory Patterns

Modern agents combine episodic and semantic memory for human-like recall. The MemGPT pattern and similar architectures treat memory as a first-class resource the agent actively manages.

MemGPT Architecture

Agents with explicit memory management—moving data between fast context and slow storage like an OS manages RAM and disk.

Tiered Memory

Hot (context), warm (vector cache), and cold (archive) tiers with automatic promotion and demotion based on access patterns.

Self-Editing Memory

Agents that can update, consolidate, and restructure their own memories rather than append-only storage.

Dual Encoder Retrieval

Separate encoders for queries and memories enable asymmetric retrieval optimized for each direction.

Temporal Knowledge Graphs

Store memories with explicit time relationships, enabling queries like "what did we discuss last week?" and detecting knowledge drift over time.

Entity-Relationship Tracking

Extract entities (people, projects, concepts) and their relationships from conversations, building a queryable knowledge graph.

Time-Weighted Retrieval

Combine semantic similarity with recency, importance, and access frequency for more relevant recall.

ZEP-Style Memory

Automatic extraction of facts, entities, and temporal relations with bi-temporal modeling (when something happened vs. when it was recorded).

Memory Management

Production memory systems require active management to stay within token budgets while preserving valuable information.

Deduplication

Detect and merge semantically similar memories to prevent bloat. Use embedding similarity thresholds or LLM-based comparison.

Token Budgets

Allocate fixed token counts to different memory types. When budget is exceeded, compress or evict lowest-priority items.

Garbage Collection

Periodically scan memories for stale, redundant, or low-value entries. LRU, LFU, or importance-weighted eviction strategies.

Priority Rules

Define what memories matter most: user preferences, task context, recent interactions, or explicitly pinned facts.

Adaptive Retention

Smart strategies for what to keep, summarize, or forget—mimicking how human memory naturally decays and consolidates.

Context Summarization

Progressive summarization: full detail for recent context, summaries for older conversations, key facts only for distant past.

Entity Extraction

Automatically identify and store important entities (names, preferences, decisions) separately from raw conversation logs.

Decay Strategies

Exponential or logarithmic decay functions reduce memory importance over time unless reinforced by access or explicit importance markers.

🧠

Memory System Visualizer

See how different memory types work together

Add Memory

Store new information

Recall

Search stored memories

Short-Term Memory

Current context

Capacity3/5 slots

User prefers dark mode

90%

Current task: Write documentation

95%

Project name is "learn-guide"

85%

Long-Term Memory

Persistent storage

memories stored2

User is a software developer

80%

Preferred programming language: TypeScript

75%

Short-term memory has limited capacity (5 slots). When full, oldest memories migrate to long-term storage.

Key Takeaways

  • 1Memory extends agent capabilities beyond the context window
  • 2Combine multiple memory types for best results
  • 3Memory retrieval adds latency—balance comprehensiveness with speed
  • 4Consider privacy and data retention when storing memories
  • 5Hybrid patterns like MemGPT enable agents to manage their own memory like an operating system
  • 6Temporal knowledge graphs add time-awareness for more contextual recall