Evaluate Translation Quality with METEOR Score

Medium
NLP

Develop a function to compute the METEOR score for evaluating machine translation quality. Given a reference translation and a candidate translation, calculate the score based on unigram matches, precision, recall, F-mean, and a penalty for word order fragmentation.

Examples

Example 1:
Input: meteor_score('Rain falls gently from the sky', 'Gentle rain drops from the sky')
Output: 0.625
Explanation: The function identifies 4 exact unigram matches ('rain', 'from', 'the', 'sky'). Note that 'gently' and 'gentle' do NOT match since exact matching is used. Precision = 4/6, Recall = 4/6, giving F-mean ≈ 0.667. The matched positions [0, 3, 4, 5] form 2 chunks, resulting in a small penalty.

Starter Code

import numpy as np
from collections import Counter

def meteor_score(reference, candidate, alpha=0.9, beta=3, gamma=0.5):
    """
    Calculate METEOR score for machine translation evaluation.
    
    Args:
        reference: Reference translation string
        candidate: Candidate translation string
        alpha: Weight for precision vs recall in F-mean (default 0.9)
        beta: Exponent for fragmentation penalty (default 3)
        gamma: Maximum penalty coefficient (default 0.5)
    
    Returns:
        METEOR score between 0 and 1
    """
    # Your code here
    pass
Lines: 1Characters: 0
Ready
The AI Interview - Master AI/ML Interviews