The AI Interview - Master AI/ML Interviews

Implement a function that calculates the Exact Match (EM) score between a list of predicted strings and a list of reference (ground truth) strings.

The Exact Match score is a common evaluation metric in NLP tasks like question answering and text generation. It measures the proportion of predictions that exactly match the corresponding references.

To make the comparison more robust, your function should normalize both predictions and references before comparison. The normalization should:

Convert text to lowercase
Remove all punctuation characters
Collapse multiple whitespace characters into single spaces and strip leading/trailing whitespace

The function should return a float representing the proportion of exact matches (between 0.0 and 1.0).

If both input lists are empty, return 0.0.

Examples

Example 1:

Input: predictions = ['Hello, World!', 'The answer is 42'], references = ['hello world', 'the answer is 42']

Output: 1.0

Explanation: After normalization, 'Hello, World!' becomes 'hello world' (lowercase, punctuation removed) which matches the reference 'hello world'. Similarly, 'The answer is 42' becomes 'the answer is 42' which matches. Both predictions match their references, so EM = 2/2 = 1.0

Starter Code

import string

def exact_match_score(predictions: list[str], references: list[str]) -> float:
    """
    Calculate the exact match score between predictions and references.
    
    Args:
        predictions: List of predicted strings
        references: List of reference (ground truth) strings
    
    Returns:
        Exact match score as a float between 0 and 1
    """
    # Your code here
    pass

Exact Match Score with Normalization

Examples

Starter Code