The AI Interview - Master AI/ML Interviews

Agent Guardrails

Guardrails prevent harmful, off-topic, or policy-violating outputs from being returned to users.

Task

Build OutputGuardrail that:

Blocks outputs containing prohibited patterns.
Enforces maximum output length.
Returns a structured validation result with safety status and reason.
Sanitizes output by truncating or masking violations.

Constraints

Pattern matching is case-insensitive.
Sanitized output replaces blocked patterns with [BLOCKED].
If length exceeded, truncate and append '...[truncated]'.

Examples

Example 1:

Input:

g = OutputGuardrail()
g.add_blocked_pattern('secret')
g.validate('The secret code is 1234')

Output: {'safe': False, 'reason': 'Blocked pattern: secret', 'sanitized': 'The [BLOCKED] code is 1234'}

Explanation: Pattern 'secret' found; output flagged and sanitized.

Starter Code

from typing import List, Dict

class OutputGuardrail:
    def __init__(self):
        self.blocked_patterns: List[str] = []
        self.max_length: int = 2000

    def add_blocked_pattern(self, pattern: str) -> None:
        pass

    def set_max_length(self, length: int) -> None:
        pass

    def validate(self, output: str) -> Dict:
        # TODO: Return {'safe': bool, 'reason': str, 'sanitized': str}
        pass

Implement Basic Guardrails for Agent Output

Agent Guardrails

Task

Constraints

Examples

Starter Code