The AI Interview

Implement Tool Misuse Detection System

Hard

Agents

Tool Misuse Detection in AI Agents

Agents can misuse tools: rate limit evasion, prompt injection, scope creep, data exfiltration.

Task

Build ToolMisuseDetector that detects:

Rate limit violations: Tool called too frequently.
Blocked patterns: Args matching disallowed patterns.
Prompt injection: Tool args containing LLM instructions (Ignore previous, Act as).
Scope creep: Agent using tools outside its authorized set.
Data exfiltration: Large data writes to external destinations.

Non-Functional Requirements

Detection latency < 10ms per call.
False positive rate < 5%.
All alerts include severity and recommended action.
Detection rules must be configurable at runtime.

Constraints

Rate limit window: sliding window (not fixed).
Prompt injection: detect common jailbreak phrases.
Exfiltration: flag any write of >10KB to external URLs.

Examples

Example 1:

Input:

detector.add_blocked_pattern('sql_query', 'DROP TABLE')
call = ToolCall('c1', 'sql_query', {'query': 'DROP TABLE users'}, None, time.time(), 'agent1', 'run1')
detector.analyze_call(call)

Output: [MisuseAlert(misuse_type='blocked_pattern', severity='critical', recommended_action='block')]

Explanation: DROP TABLE matches blocked pattern; critical severity SQL injection.

Starter Code

from typing import Dict, List, Any, Optional, Tuple
from dataclasses import dataclass, field
from datetime import datetime
import re

@dataclass
class ToolCall:
    call_id: str
    tool_name: str
    args: Dict
    result: Any
    timestamp: float
    agent_id: str
    run_id: str

@dataclass
class MisuseAlert:
    alert_id: str
    call_id: str
    misuse_type: str
    severity: str  # low|medium|high|critical
    description: str
    recommended_action: str  # allow|warn|block|terminate

class ToolMisuseDetector:
    def __init__(self):
        self.call_history: List[ToolCall] = []
        self.alerts: List[MisuseAlert] = []
        self.rate_limits: Dict[str, Tuple[int, float]] = {}  # tool -> (max_calls, window_s)
        self.blocked_patterns: Dict[str, List[str]] = {}     # tool -> [arg patterns]

    def configure_rate_limit(self, tool: str, max_calls: int, window_s: float) -> None:
        pass

    def add_blocked_pattern(self, tool: str, arg_pattern: str) -> None:
        pass

    def analyze_call(self, call: ToolCall) -> List[MisuseAlert]:
        # TODO: Run all detection checks, return alerts
        pass

    def _check_rate_limit(self, call: ToolCall) -> Optional[MisuseAlert]:
        pass

    def _check_blocked_patterns(self, call: ToolCall) -> Optional[MisuseAlert]:
        pass

    def _check_prompt_injection(self, call: ToolCall) -> Optional[MisuseAlert]:
        # TODO: Detect if tool args contain LLM instruction injection
        pass

    def _check_scope_creep(self, call: ToolCall) -> Optional[MisuseAlert]:
        # TODO: Detect if agent is using tools beyond its authorized scope
        pass

    def _check_exfiltration(self, call: ToolCall) -> Optional[MisuseAlert]:
        # TODO: Detect potential data exfiltration patterns
        pass