The AI Interview - Master AI/ML Interviews

Implement the BLEU (Bilingual Evaluation Understudy) score metric, which is widely used to evaluate the quality of machine-generated text by comparing it against one or more reference texts.

Given a candidate sentence (as a list of tokens), a list of reference sentences (each as a list of tokens), and a maximum n-gram order, compute the BLEU score.

Your function should:

Calculate modified n-gram precision for each n from 1 to max_n, where counts are clipped to avoid gaming by repetition
Apply a brevity penalty to discourage overly short translations
Combine the precisions using a geometric mean
Return 0.0 if any n-gram precision is zero or if the candidate is empty
When selecting the reference length for brevity penalty with multiple references, choose the length closest to the candidate length (if tied, choose shorter)

Examples

Example 1:

Input: candidate = ['a', 'b', 'c', 'd'], references = [['a', 'b', 'x', 'd']], max_n = 2

Output: 0.5

Explanation: For 1-grams: candidate has {a, b, c, d}, reference has {a, b, x, d}. Clipped counts: a=1, b=1, c=0, d=1, total clipped=3, total candidate=4, so p1=3/4=0.75. For 2-grams: candidate has {(a,b), (b,c), (c,d)}, reference has {(a,b), (b,x), (x,d)}. Only (a,b) matches, so p2=1/3. Geometric mean = exp((log(0.75) + log(0.333))/2) = exp(-0.693) = 0.5. Since candidate length equals reference length, brevity penalty = 1.0. Final BLEU = 1.0 * 0.5 = 0.5.

Starter Code

import numpy as np
from collections import Counter

def bleu_score(candidate: list[str], references: list[list[str]], max_n: int = 4) -> float:
    """
    Calculate BLEU score for a candidate sentence against reference sentences.
    
    Args:
        candidate: List of tokens in the candidate sentence
        references: List of reference sentences, each as a list of tokens
        max_n: Maximum n-gram order (default: 4)
    
    Returns:
        BLEU score between 0 and 1
    """
    # Your code here
    pass

BLEU Score for Text Generation

Examples

Starter Code