The AI Interview - Master AI/ML Interviews

Implement a function that verifies if two mathematical answer strings are equivalent. This is a crucial task in evaluating LLM outputs on math benchmarks where the model's answer might be expressed differently from the ground truth but still be mathematically correct.

The function should handle:

Direct string equality
Numeric equivalence (integers and floats)
Fraction expressions (e.g., '1/2' should match '0.5')
Square root expressions (e.g., 'sqrt(4)' should match '2')
Pi expressions (e.g., 'pi' should match '3.14159...')
Combinations with basic arithmetic operations

The function should return True if the predicted answer is mathematically equivalent to the ground truth within the specified tolerance, and False otherwise. Non-numeric or unparseable expressions that don't match exactly should return False.

Examples

Example 1:

Input: predicted='1/2', ground_truth='0.5'

Output: True

Explanation: The function first checks if the strings are equal (they are not). Then it parses '1/2' as a mathematical expression, evaluating it to 0.5. It also parses '0.5' directly as a float. Since abs(0.5 - 0.5) = 0 <= 1e-6 (the tolerance), the function returns True, indicating the answers are equivalent.

Example 2:

Input: Hidden test case or specific edge case

Output: Correct evaluated result

Explanation: An additional example to demonstrate the robustness of the implementation.

Starter Code

import re
import math

def verify_math_answer(predicted: str, ground_truth: str, tolerance: float = 1e-6) -> bool:
    """
    Verify if two mathematical answers are equivalent.
    
    Args:
        predicted: The predicted answer string
        ground_truth: The ground truth answer string
        tolerance: Numerical tolerance for comparison
    
    Returns:
        True if answers are equivalent, False otherwise
    """
    pass

Math Answer Verification with Equivalence Checking

Examples

Starter Code