The AI Interview - Master AI/ML Interviews

In production ML systems, Service Level Agreement (SLA) monitoring is crucial for ensuring your model serving endpoints meet performance guarantees. Given a list of request results from a model serving endpoint, compute key SLA compliance metrics.

Each request result is a dictionary with:

'latency_ms': Response latency in milliseconds (float)
'status': Either 'success', 'error', or 'timeout'

Write a function calculate_sla_metrics(requests, latency_sla_ms) that computes:

Latency SLA Compliance: Percentage of successful requests that completed within the latency threshold
Error Rate: Percentage of all requests that resulted in an error or timeout
Overall SLA Compliance: Percentage of all requests that both succeeded AND met the latency threshold

The function should return a dictionary with these three metrics. If the input list is empty, return an empty dictionary. If there are no successful requests, latency_sla_compliance should be 0.0.

All returned values should be percentages (0-100) rounded to 2 decimal places.

Examples

Example 1:

Input:

requests = [{'status': 'success', 'latency_ms': 50}, {'status': 'success', 'latency_ms': 80}, {'status': 'success', 'latency_ms': 120}, {'status': 'error', 'latency_ms': 30}, {'status': 'timeout', 'latency_ms': 5000}], latency_sla_ms = 100.0

Output: {'latency_sla_compliance': 66.67, 'error_rate': 40.0, 'overall_sla_compliance': 40.0}

Explanation: Out of 5 total requests, 3 succeeded. Of the 3 successful requests, 2 had latency <= 100ms (50ms and 80ms), giving latency_sla_compliance = 2/3 * 100 = 66.67%. There were 2 failed requests (1 error + 1 timeout), giving error_rate = 2/5 * 100 = 40%. Overall SLA compliance = 2/5 * 100 = 40% (requests that both succeeded AND met the latency threshold).

Starter Code

def calculate_sla_metrics(requests: list, latency_sla_ms: float = 100.0) -> dict:
    """
    Calculate SLA compliance metrics for a model serving endpoint.
    
    Args:
        requests: list of request results, each a dict with 'latency_ms' and 'status'
        latency_sla_ms: maximum acceptable latency in ms for SLA compliance
    
    Returns:
        dict with keys: 'latency_sla_compliance', 'error_rate', 'overall_sla_compliance'
        All values as percentages (0-100), rounded to 2 decimal places.
    """
    pass

Calculate SLA Compliance Metrics for Model Service

Examples

Starter Code