The AI Interview - Master AI/ML Interviews

Implement latency optimization for agent systems:

Optimization Strategies:

Caching: Memoize deterministic tool calls
Prefetching: Load embeddings before needed
Parallelization: Run independent calls concurrently
Plan Optimization: Reorder for parallelism

Methods:

cached_tool_call(tool_func, params, ttl): Cache with TTL
- Hash params as cache key
- Check expiry before returning
prefetch_embeddings(texts): Async load
parallel_tool_calls(calls): ThreadPool execution
optimize_plan(plan): Dependency analysis, reorder
get_stats(): Hit rates, avg latency

Cache Key: hash(tool_name + canonical_json(params))

Examples

Example 1:

Input:

opt = LatencyOptimizer(); result1 = opt.cached_tool_call(lambda: 'expensive', {}, ttl_seconds=60); result2 = opt.cached_tool_call(lambda: 'different', {}, 60); result1 == result2

Output: True

Explanation: Same params hit cache, return cached result

Starter Code

class LatencyOptimizer:
    """
    Optimize agent execution latency through caching, prefetching, and parallelization.
    """
    
    def __init__(self):
        self.tool_cache = {}
        self.embedding_cache = {}
        self.prefetch_queue = []
        self.execution_stats = {}
    
    def cached_tool_call(self, tool_func, params, ttl_seconds=60):
        """
        Execute tool with caching.
        Returns cached result if available and not expired.
        """
        # Your implementation here
        pass
    
    def prefetch_embeddings(self, texts):
        """Prefetch embeddings in background"""
        # Your implementation here
        pass
    
    def parallel_tool_calls(self, tool_calls):
        """
        Execute independent tool calls in parallel.
        tool_calls: list of {'tool': func, 'params': {...}}
        """
        # Your implementation here
        pass
    
    def optimize_plan(self, execution_plan):
        """
        Optimize execution plan for latency.
        Reorder independent operations, add caching hints.
        """
        # Your implementation here
        pass
    
    def get_stats(self):
        """Get cache hit rates and latency stats"""
        # Your implementation here
        pass

Agent Latency Optimizer

Examples

Starter Code