Implement Tool Misuse Detection System

Hard
Agents

Tool Misuse Detection in AI Agents

Agents can misuse tools: rate limit evasion, prompt injection, scope creep, data exfiltration.

Task

Build ToolMisuseDetector that detects:

  1. Rate limit violations: Tool called too frequently.
  2. Blocked patterns: Args matching disallowed patterns.
  3. Prompt injection: Tool args containing LLM instructions (Ignore previous, Act as).
  4. Scope creep: Agent using tools outside its authorized set.
  5. Data exfiltration: Large data writes to external destinations.

Non-Functional Requirements

  • Detection latency < 10ms per call.
  • False positive rate < 5%.
  • All alerts include severity and recommended action.
  • Detection rules must be configurable at runtime.

Constraints

  • Rate limit window: sliding window (not fixed).
  • Prompt injection: detect common jailbreak phrases.
  • Exfiltration: flag any write of >10KB to external URLs.

Examples

Example 1:
Input: detector.add_blocked_pattern('sql_query', 'DROP TABLE') call = ToolCall('c1', 'sql_query', {'query': 'DROP TABLE users'}, None, time.time(), 'agent1', 'run1') detector.analyze_call(call)
Output: [MisuseAlert(misuse_type='blocked_pattern', severity='critical', recommended_action='block')]
Explanation: DROP TABLE matches blocked pattern; critical severity SQL injection.

Starter Code

from typing import Dict, List, Any, Optional, Tuple
from dataclasses import dataclass, field
from datetime import datetime
import re

@dataclass
class ToolCall:
    call_id: str
    tool_name: str
    args: Dict
    result: Any
    timestamp: float
    agent_id: str
    run_id: str

@dataclass
class MisuseAlert:
    alert_id: str
    call_id: str
    misuse_type: str
    severity: str  # low|medium|high|critical
    description: str
    recommended_action: str  # allow|warn|block|terminate

class ToolMisuseDetector:
    def __init__(self):
        self.call_history: List[ToolCall] = []
        self.alerts: List[MisuseAlert] = []
        self.rate_limits: Dict[str, Tuple[int, float]] = {}  # tool -> (max_calls, window_s)
        self.blocked_patterns: Dict[str, List[str]] = {}     # tool -> [arg patterns]

    def configure_rate_limit(self, tool: str, max_calls: int, window_s: float) -> None:
        pass

    def add_blocked_pattern(self, tool: str, arg_pattern: str) -> None:
        pass

    def analyze_call(self, call: ToolCall) -> List[MisuseAlert]:
        # TODO: Run all detection checks, return alerts
        pass

    def _check_rate_limit(self, call: ToolCall) -> Optional[MisuseAlert]:
        pass

    def _check_blocked_patterns(self, call: ToolCall) -> Optional[MisuseAlert]:
        pass

    def _check_prompt_injection(self, call: ToolCall) -> Optional[MisuseAlert]:
        # TODO: Detect if tool args contain LLM instruction injection
        pass

    def _check_scope_creep(self, call: ToolCall) -> Optional[MisuseAlert]:
        # TODO: Detect if agent is using tools beyond its authorized scope
        pass

    def _check_exfiltration(self, call: ToolCall) -> Optional[MisuseAlert]:
        # TODO: Detect potential data exfiltration patterns
        pass
Lines: 1Characters: 0
Ready
The AI Interview - Master AI/ML Interviews