Classify tool determinism by empirical testing:
- Run tool_func(**inputs) n_trials times for each test input
- Compare outputs - if ANY trial differs, tool is non-deterministic
- Return classification with evidence
Output Format:
{
'is_deterministic': True/False,
'evidence': 'All N trials produced identical results' OR 'Trial 1: X, Trial 2: Y differed',
'confidence': 0.0-1.0 # 1.0 if deterministic, lower if few trials
}
Constraints:
- Handle exceptions by marking non-deterministic (may be flaky)
- Compare using
== - Confidence = 1.0 if deterministic, 0.7 if non-deterministic (empirical)
Examples
Example 1:
Input:
classify_tool_determinism(lambda x: x*2, [{'x': 5}], 5)Output:
{'is_deterministic': True, 'evidence': 'All 5 trials produced identical results', 'confidence': 1.0}Explanation: Pure function, always same output
Starter Code
def classify_tool_determinism(tool_func, test_inputs, n_trials=3):
"""
Classify whether a tool is deterministic or non-deterministic
by running it multiple times with same inputs.
Args:
tool_func: Function to test
test_inputs: List of input dicts to test
n_trials: Number of times to run each input
Returns:
dict with 'is_deterministic', 'evidence', 'confidence'
"""
# Your implementation here
passPython3
ReadyLines: 1Characters: 0
Ready