Agent Guardrails
Guardrails prevent harmful, off-topic, or policy-violating outputs from being returned to users.
Task
Build OutputGuardrail that:
- Blocks outputs containing prohibited patterns.
- Enforces maximum output length.
- Returns a structured validation result with safety status and reason.
- Sanitizes output by truncating or masking violations.
Constraints
- Pattern matching is case-insensitive.
- Sanitized output replaces blocked patterns with
[BLOCKED]. - If length exceeded, truncate and append
'...[truncated]'.
Examples
Example 1:
Input:
g = OutputGuardrail()
g.add_blocked_pattern('secret')
g.validate('The secret code is 1234')Output:
{'safe': False, 'reason': 'Blocked pattern: secret', 'sanitized': 'The [BLOCKED] code is 1234'}Explanation: Pattern 'secret' found; output flagged and sanitized.
Starter Code
from typing import List, Dict
class OutputGuardrail:
def __init__(self):
self.blocked_patterns: List[str] = []
self.max_length: int = 2000
def add_blocked_pattern(self, pattern: str) -> None:
pass
def set_max_length(self, length: int) -> None:
pass
def validate(self, output: str) -> Dict:
# TODO: Return {'safe': bool, 'reason': str, 'sanitized': str}
pass
Python3
ReadyLines: 1Characters: 0
Ready