Full Agent Observability Pipeline with Cost Tracking and Alerts

Hard
Agents

Agent Observability Pipeline with Cost Tracking

Production AI agents need comprehensive telemetry to debug issues, track costs, and enforce SLAs.

Task

Build AgentObservabilityPipeline that:

  1. Computes LLM call cost from pricing table.
  2. Records LLM and tool calls with full metadata.
  3. Fires alerts when cumulative run cost or error rate breaches thresholds.
  4. Provides a cross-run dashboard: p50/p95 latency, avg cost, tool error rate.
  5. Exports all records for a run as JSONL.

Non-Functional Requirements

  • Cost accurate to 6 decimal places.
  • Alert fires synchronously on record_llm_call.
  • Dashboard must handle 10,000+ runs efficiently.
  • JSONL: one JSON object per record per line.

Examples

Example 1:
Input: pipeline.compute_cost('gpt-4o', 1_000_000, 0) # pricing: {'gpt-4o': {'input': 5.0, 'output': 15.0}}
Output: 5.0
Explanation: 1M input tokens × $5.00 per 1M = $5.00

Starter Code

from typing import Dict, List, Any, Optional
from dataclasses import dataclass, field
import time, json

@dataclass
class LLMCallRecord:
    call_id: str
    model: str
    input_tokens: int
    output_tokens: int
    latency_ms: float
    cost_usd: float
    timestamp: float
    run_id: str
    step: int
    cached: bool = False

@dataclass
class ToolCallRecord:
    call_id: str
    tool_name: str
    latency_ms: float
    success: bool
    error: Optional[str]
    run_id: str
    step: int

class AgentObservabilityPipeline:
    def __init__(self, pricing: Dict[str, Dict]):
        self.pricing = pricing
        self.llm_calls: List[LLMCallRecord] = []
        self.tool_calls: List[ToolCallRecord] = []
        self.run_metadata: Dict[str, Dict] = {}
        self.alerts: List[Dict] = []
        self.alert_thresholds: Dict[str, float] = {}

    def compute_cost(self, model: str, input_tokens: int, output_tokens: int) -> float:
        pass

    def record_llm_call(self, record: LLMCallRecord) -> None:
        pass

    def record_tool_call(self, record: ToolCallRecord) -> None:
        pass

    def start_run(self, run_id: str, metadata: Dict) -> None:
        pass

    def end_run(self, run_id: str, status: str) -> Dict:
        pass

    def set_alert(self, metric: str, threshold: float) -> None:
        pass

    def dashboard(self) -> Dict:
        pass

    def export_jsonl(self, run_id: str) -> str:
        pass
Lines: 1Characters: 0
Ready
The AI Interview - Master AI/ML Interviews