Agent Cost Optimization
Running AI agents at scale is expensive. This question covers systematic cost reduction strategies.
Task
Build a CostOptimizer that:
- Routes tasks to the cheapest model meeting quality/latency requirements.
- Implements model cascade (try cheap → escalate if quality fails).
- Compresses prompts to reduce input tokens.
- Batches similar tasks for API efficiency.
- Reports total cost savings vs always-use-frontier baseline.
Non-Functional Requirements
- Target 60-80% cost reduction vs frontier-only baseline.
- Quality SLA: never fall below task's
quality_requirement. - Latency SLA: never exceed
latency_budget_ms.
Constraints
- Model selection: cheapest tier meeting quality AND latency constraints.
- Cascade: nano → small → large → frontier (stop at first passing quality check).
- Prompt optimization must not alter semantic meaning.
Examples
Example 1:
Input:
task = TaskProfile('1', 'classify', 'simple', 100, 50, quality_requirement=0.7, latency_budget_ms=2000, cost_budget_usd=0.01)
optimizer.select_model(task)Output:
ModelConfig(tier=ModelTier.NANO, name='gpt-4o-mini', ...)Explanation: Simple task with 0.7 quality requirement met by nano tier; cheapest eligible model selected.
Starter Code
from typing import Dict, List, Any, Optional, Tuple
from dataclasses import dataclass, field
from enum import Enum
class ModelTier(Enum):
NANO = 'nano' # Cheapest: simple tasks
SMALL = 'small' # Balanced
LARGE = 'large' # Expensive: complex reasoning
FRONTIER = 'frontier' # Most capable, most expensive
@dataclass
class ModelConfig:
tier: ModelTier
name: str
input_cost_per_1m: float # USD per 1M input tokens
output_cost_per_1m: float # USD per 1M output tokens
max_context: int
avg_latency_ms: float
quality_score: float # 0-1
@dataclass
class TaskProfile:
task_id: str
task_type: str
complexity: str # simple|medium|complex|critical
estimated_input_tokens: int
estimated_output_tokens: int
quality_requirement: float # minimum quality 0-1
latency_budget_ms: float
cost_budget_usd: float
class CostOptimizer:
def __init__(self, models: List[ModelConfig]):
self.models = {m.tier: m for m in models}
self.routing_history: List[Dict] = []
self.cost_savings: float = 0.0
def select_model(self, task: TaskProfile) -> ModelConfig:
# TODO: Select cheapest model meeting quality and latency requirements
pass
def estimate_cost(self, model: ModelConfig, task: TaskProfile) -> float:
pass
def route_with_cascade(self, task: TaskProfile, agent_fn: callable) -> Dict:
# TODO: Try cheaper model first; escalate to expensive if quality insufficient
pass
def optimize_prompt(self, prompt: str, target_reduction: float = 0.3) -> str:
# TODO: Remove redundancy, compress whitespace, shorten examples
pass
def batch_similar_tasks(self, tasks: List[TaskProfile]) -> List[List[TaskProfile]]:
# TODO: Group similar tasks for batching
pass
def report(self) -> Dict:
# TODO: Return savings vs baseline (always using frontier)
pass
Python3
ReadyLines: 1Characters: 0
Ready