The AI Interview - Master AI/ML Interviews

Memory Compression in Long-Running Agents

Long conversations exceed context windows. Agents must compress history intelligently.

Task

Implement MemoryCompressor that:

Detects when to compress (token count exceeds threshold).
Compresses old messages into a summary while keeping recent N messages verbatim.
Implements hierarchical compression for very long histories.
Tracks summary history for auditability.

Constraints

Always keep the last N messages verbatim (default 5).
Summary message has role 'system' and content 'Previous context: {summary}'.
Hierarchical compression: compress summaries themselves when needed.

Examples

Example 1:

Input: compressor.compress([{'role':'user','content':'msg1'}, ...(20 messages)...], keep_last_n=3)

Output: [{'role':'system','content':'Previous context: ...'}, last_3_messages]

Explanation: Old messages summarized; 3 recent kept verbatim.

Starter Code

from typing import List, Dict, Any

class MemoryCompressor:
    def __init__(self, llm_fn: callable, max_tokens: int = 2000, compression_threshold: int = 1500):
        self.llm_fn = llm_fn
        self.max_tokens = max_tokens
        self.compression_threshold = compression_threshold
        self.summary_history: List[str] = []

    def estimate_tokens(self, text: str) -> int:
        # Approximate: 1 token ≈ 4 characters
        return len(text) // 4

    def should_compress(self, messages: List[Dict]) -> bool:
        pass

    def compress(self, messages: List[Dict], keep_last_n: int = 5) -> List[Dict]:
        # TODO: Summarize old messages, keep recent ones
        pass

    def hierarchical_compress(self, messages: List[Dict], levels: int = 2) -> Dict:
        # TODO: Multi-level compression for very long histories
        pass

Memory Compression and Summarization

Memory Compression in Long-Running Agents

Task

Constraints

Examples

Starter Code