Hallucination Detection
Hallucinations occur when agents generate factually incorrect or unsupported information.
Task
Implement HallucinationDetector that:
- Extracts claims from agent responses.
- Checks each claim against ground truth documents.
- Returns a hallucination rate (unsupported claims / total claims).
Constraints
- A claim is 'supported' if any keyword from the claim appears in any document.
- Keyword matching is case-insensitive and uses significant words (>4 chars).
- Hallucination rate: 0.0 (none) to 1.0 (all hallucinated).
Examples
Example 1:
Input:
det = HallucinationDetector(['Paris is the capital of France'])
det.analyze('Paris is in France. Rome is in Japan.')Output:
{'claims': [...], 'hallucination_rate': 0.5}Explanation: First claim supported by doc; second not (Japan/Rome not in docs).
Starter Code
from typing import List, Dict
class HallucinationDetector:
def __init__(self, ground_truth_docs: List[str]):
self.docs = ground_truth_docs
def extract_claims(self, text: str) -> List[str]:
# Simple: split on period and filter short strings
return [s.strip() for s in text.split('.') if len(s.strip()) > 10]
def verify_claim(self, claim: str) -> Dict:
# TODO: Check if claim is supported by any doc
# Return {'claim': str, 'supported': bool, 'source': Optional[str]}
pass
def analyze(self, response: str) -> Dict:
# TODO: Return {'claims': list, 'hallucination_rate': float}
pass
Python3
ReadyLines: 1Characters: 0
Ready