Implement a function that verifies if two mathematical answer strings are equivalent. This is a crucial task in evaluating LLM outputs on math benchmarks where the model's answer might be expressed differently from the ground truth but still be mathematically correct.
The function should handle:
- Direct string equality
- Numeric equivalence (integers and floats)
- Fraction expressions (e.g., '1/2' should match '0.5')
- Square root expressions (e.g., 'sqrt(4)' should match '2')
- Pi expressions (e.g., 'pi' should match '3.14159...')
- Combinations with basic arithmetic operations
The function should return True if the predicted answer is mathematically equivalent to the ground truth within the specified tolerance, and False otherwise. Non-numeric or unparseable expressions that don't match exactly should return False.
Examples
Example 1:
Input:
predicted='1/2', ground_truth='0.5'Output:
TrueExplanation: The function first checks if the strings are equal (they are not). Then it parses '1/2' as a mathematical expression, evaluating it to 0.5. It also parses '0.5' directly as a float. Since abs(0.5 - 0.5) = 0 <= 1e-6 (the tolerance), the function returns True, indicating the answers are equivalent.
Example 2:
Input:
Hidden test case or specific edge caseOutput:
Correct evaluated resultExplanation: An additional example to demonstrate the robustness of the implementation.
Starter Code
import re
import math
def verify_math_answer(predicted: str, ground_truth: str, tolerance: float = 1e-6) -> bool:
"""
Verify if two mathematical answers are equivalent.
Args:
predicted: The predicted answer string
ground_truth: The ground truth answer string
tolerance: Numerical tolerance for comparison
Returns:
True if answers are equivalent, False otherwise
"""
passPython3
ReadyLines: 1Characters: 0
Ready