The AI Interview - Master AI/ML Interviews

In production ML systems, monitoring the health of batch prediction jobs is essential for maintaining service reliability. Given a list of prediction results from a batch job, compute key health metrics that are commonly tracked in MLOps dashboards.

Each prediction result is a dictionary with:

'status': Either 'success' or 'error'
'confidence': A float between 0 and 1 (only present when status is 'success')

Write a function calculate_batch_health(predictions, confidence_threshold) that computes:

Success Rate: Percentage of predictions that completed successfully
Average Confidence: Mean confidence score of successful predictions (as a percentage)
Low Confidence Rate: Percentage of successful predictions with confidence below the threshold

The function should return a dictionary with these three metrics. If the input list is empty, return an empty dictionary. If there are no successful predictions, return success_rate as calculated and both confidence metrics as 0.0.

All returned values should be rounded to 2 decimal places.

Examples

Example 1:

Input:

predictions = [{'status': 'success', 'confidence': 0.9}, {'status': 'success', 'confidence': 0.8}, {'status': 'error'}, {'status': 'success', 'confidence': 0.4}, {'status': 'success', 'confidence': 0.7}], confidence_threshold = 0.5

Output: {'success_rate': 80.0, 'avg_confidence': 70.0, 'low_confidence_rate': 25.0}

Explanation: Out of 5 predictions, 4 succeeded (80% success rate). The successful predictions have confidences [0.9, 0.8, 0.4, 0.7], averaging to 0.7 (70%). Only one prediction (0.4) is below the 0.5 threshold, giving a low confidence rate of 1/4 = 25%.

Starter Code

def calculate_batch_health(predictions: list, confidence_threshold: float = 0.5) -> dict:
    """
    Calculate health metrics for a batch prediction job.
    
    Args:
        predictions: list of prediction results, each a dict with 'status' and optionally 'confidence'
        confidence_threshold: threshold below which a prediction is considered low confidence
    
    Returns:
        dict with keys: 'success_rate', 'avg_confidence', 'low_confidence_rate'
        All values as percentages (0-100), rounded to 2 decimal places.
    """
    pass

Calculate Batch Prediction Health Metrics

Examples

Starter Code