Memory Compression and Summarization

Medium
Agents

Memory Compression in Long-Running Agents

Long conversations exceed context windows. Agents must compress history intelligently.

Task

Implement MemoryCompressor that:

  1. Detects when to compress (token count exceeds threshold).
  2. Compresses old messages into a summary while keeping recent N messages verbatim.
  3. Implements hierarchical compression for very long histories.
  4. Tracks summary history for auditability.

Constraints

  • Always keep the last N messages verbatim (default 5).
  • Summary message has role 'system' and content 'Previous context: {summary}'.
  • Hierarchical compression: compress summaries themselves when needed.

Examples

Example 1:
Input: compressor.compress([{'role':'user','content':'msg1'}, ...(20 messages)...], keep_last_n=3)
Output: [{'role':'system','content':'Previous context: ...'}, last_3_messages]
Explanation: Old messages summarized; 3 recent kept verbatim.

Starter Code

from typing import List, Dict, Any

class MemoryCompressor:
    def __init__(self, llm_fn: callable, max_tokens: int = 2000, compression_threshold: int = 1500):
        self.llm_fn = llm_fn
        self.max_tokens = max_tokens
        self.compression_threshold = compression_threshold
        self.summary_history: List[str] = []

    def estimate_tokens(self, text: str) -> int:
        # Approximate: 1 token ≈ 4 characters
        return len(text) // 4

    def should_compress(self, messages: List[Dict]) -> bool:
        pass

    def compress(self, messages: List[Dict], keep_last_n: int = 5) -> List[Dict]:
        # TODO: Summarize old messages, keep recent ones
        pass

    def hierarchical_compress(self, messages: List[Dict], levels: int = 2) -> Dict:
        # TODO: Multi-level compression for very long histories
        pass
Lines: 1Characters: 0
Ready
The AI Interview - Master AI/ML Interviews