Implement Constitutional AI safety layer:
Constitutional AI: Train/constraint AI to follow a set of principles (constitution).
Methods:
critique_output(output, context): Check against principles- Return violations:
[{'principle': ..., 'severity': ..., 'explanation': ...}]
- Return violations:
revise_output(output, critiques): Fix violationsself_critique_loop(initial, max_iter): Iterate critique→reviseadd_principle(principle, priority): Extend constitutionevaluate_compliance(outputs): Stats on adherence
Constitution Format: List of strings like:
- 'Be helpful and harmless'
- 'Respect user privacy'
- 'Avoid generating harmful content'
Critique Simulation: For testing, use keyword matching against violation patterns
Examples
Example 1:
Input:
const = ['Be helpful', 'Be honest']; layer = ConstitutionalAILayer(const); violations = layer.critique_output('I will help you', {}); isinstance(violations, list)Output:
TrueExplanation: Critique returns list of violations (empty if compliant)
Starter Code
class ConstitutionalAILayer:
"""
Safety layer implementing constitutional AI principles.
"""
def __init__(self, constitution):
self.constitution = constitution # List of principles
self.critique_model = None
self.revision_history = []
def critique_output(self, output, context):
"""
Critique output against constitutional principles.
Returns list of violations with severity.
"""
# Your implementation here
pass
def revise_output(self, output, critiques):
"""
Revise output to address critiques.
Returns revised output.
"""
# Your implementation here
pass
def self_critique_loop(self, initial_output, max_iterations=3):
"""
Iteratively critique and revise until no violations or max iterations.
"""
# Your implementation here
pass
def add_principle(self, principle, priority='high'):
"""Add new constitutional principle"""
# Your implementation here
pass
def evaluate_compliance(self, outputs):
"""Evaluate compliance rate across multiple outputs"""
# Your implementation here
passPython3
ReadyLines: 1Characters: 0
Ready