Math Answer Verification with Equivalence Checking

Medium
LLM

Implement a function that verifies if two mathematical answer strings are equivalent. This is a crucial task in evaluating LLM outputs on math benchmarks where the model's answer might be expressed differently from the ground truth but still be mathematically correct.

The function should handle:

  • Direct string equality
  • Numeric equivalence (integers and floats)
  • Fraction expressions (e.g., '1/2' should match '0.5')
  • Square root expressions (e.g., 'sqrt(4)' should match '2')
  • Pi expressions (e.g., 'pi' should match '3.14159...')
  • Combinations with basic arithmetic operations

The function should return True if the predicted answer is mathematically equivalent to the ground truth within the specified tolerance, and False otherwise. Non-numeric or unparseable expressions that don't match exactly should return False.

Examples

Example 1:
Input: predicted='1/2', ground_truth='0.5'
Output: True
Explanation: The function first checks if the strings are equal (they are not). Then it parses '1/2' as a mathematical expression, evaluating it to 0.5. It also parses '0.5' directly as a float. Since abs(0.5 - 0.5) = 0 <= 1e-6 (the tolerance), the function returns True, indicating the answers are equivalent.
Example 2:
Input: Hidden test case or specific edge case
Output: Correct evaluated result
Explanation: An additional example to demonstrate the robustness of the implementation.

Starter Code

import re
import math

def verify_math_answer(predicted: str, ground_truth: str, tolerance: float = 1e-6) -> bool:
    """
    Verify if two mathematical answers are equivalent.
    
    Args:
        predicted: The predicted answer string
        ground_truth: The ground truth answer string
        tolerance: Numerical tolerance for comparison
    
    Returns:
        True if answers are equivalent, False otherwise
    """
    pass
Lines: 1Characters: 0
Ready
The AI Interview - Master AI/ML Interviews