Implement Basic Guardrails for Agent Output

Easy
Agents

Agent Guardrails

Guardrails prevent harmful, off-topic, or policy-violating outputs from being returned to users.

Task

Build OutputGuardrail that:

  1. Blocks outputs containing prohibited patterns.
  2. Enforces maximum output length.
  3. Returns a structured validation result with safety status and reason.
  4. Sanitizes output by truncating or masking violations.

Constraints

  • Pattern matching is case-insensitive.
  • Sanitized output replaces blocked patterns with [BLOCKED].
  • If length exceeded, truncate and append '...[truncated]'.

Examples

Example 1:
Input: g = OutputGuardrail() g.add_blocked_pattern('secret') g.validate('The secret code is 1234')
Output: {'safe': False, 'reason': 'Blocked pattern: secret', 'sanitized': 'The [BLOCKED] code is 1234'}
Explanation: Pattern 'secret' found; output flagged and sanitized.

Starter Code

from typing import List, Dict

class OutputGuardrail:
    def __init__(self):
        self.blocked_patterns: List[str] = []
        self.max_length: int = 2000

    def add_blocked_pattern(self, pattern: str) -> None:
        pass

    def set_max_length(self, length: int) -> None:
        pass

    def validate(self, output: str) -> Dict:
        # TODO: Return {'safe': bool, 'reason': str, 'sanitized': str}
        pass
Lines: 1Characters: 0
Ready
The AI Interview - Master AI/ML Interviews