Implement a function that calculates the Exact Match (EM) score between a list of predicted strings and a list of reference (ground truth) strings.
The Exact Match score is a common evaluation metric in NLP tasks like question answering and text generation. It measures the proportion of predictions that exactly match the corresponding references.
To make the comparison more robust, your function should normalize both predictions and references before comparison. The normalization should:
- Convert text to lowercase
- Remove all punctuation characters
- Collapse multiple whitespace characters into single spaces and strip leading/trailing whitespace
The function should return a float representing the proportion of exact matches (between 0.0 and 1.0).
If both input lists are empty, return 0.0.
Examples
Example 1:
Input:
predictions = ['Hello, World!', 'The answer is 42'], references = ['hello world', 'the answer is 42']Output:
1.0Explanation: After normalization, 'Hello, World!' becomes 'hello world' (lowercase, punctuation removed) which matches the reference 'hello world'. Similarly, 'The answer is 42' becomes 'the answer is 42' which matches. Both predictions match their references, so EM = 2/2 = 1.0
Starter Code
import string
def exact_match_score(predictions: list[str], references: list[str]) -> float:
"""
Calculate the exact match score between predictions and references.
Args:
predictions: List of predicted strings
references: List of reference (ground truth) strings
Returns:
Exact match score as a float between 0 and 1
"""
# Your code here
passPython3
ReadyLines: 1Characters: 0
Ready