In production ML systems, Service Level Agreement (SLA) monitoring is crucial for ensuring your model serving endpoints meet performance guarantees. Given a list of request results from a model serving endpoint, compute key SLA compliance metrics.
Each request result is a dictionary with:
- 'latency_ms': Response latency in milliseconds (float)
- 'status': Either 'success', 'error', or 'timeout'
Write a function calculate_sla_metrics(requests, latency_sla_ms) that computes:
- Latency SLA Compliance: Percentage of successful requests that completed within the latency threshold
- Error Rate: Percentage of all requests that resulted in an error or timeout
- Overall SLA Compliance: Percentage of all requests that both succeeded AND met the latency threshold
The function should return a dictionary with these three metrics. If the input list is empty, return an empty dictionary. If there are no successful requests, latency_sla_compliance should be 0.0.
All returned values should be percentages (0-100) rounded to 2 decimal places.
Examples
Example 1:
Input:
requests = [{'status': 'success', 'latency_ms': 50}, {'status': 'success', 'latency_ms': 80}, {'status': 'success', 'latency_ms': 120}, {'status': 'error', 'latency_ms': 30}, {'status': 'timeout', 'latency_ms': 5000}], latency_sla_ms = 100.0Output:
{'latency_sla_compliance': 66.67, 'error_rate': 40.0, 'overall_sla_compliance': 40.0}Explanation: Out of 5 total requests, 3 succeeded. Of the 3 successful requests, 2 had latency <= 100ms (50ms and 80ms), giving latency_sla_compliance = 2/3 * 100 = 66.67%. There were 2 failed requests (1 error + 1 timeout), giving error_rate = 2/5 * 100 = 40%. Overall SLA compliance = 2/5 * 100 = 40% (requests that both succeeded AND met the latency threshold).
Starter Code
def calculate_sla_metrics(requests: list, latency_sla_ms: float = 100.0) -> dict:
"""
Calculate SLA compliance metrics for a model serving endpoint.
Args:
requests: list of request results, each a dict with 'latency_ms' and 'status'
latency_sla_ms: maximum acceptable latency in ms for SLA compliance
Returns:
dict with keys: 'latency_sla_compliance', 'error_rate', 'overall_sla_compliance'
All values as percentages (0-100), rounded to 2 decimal places.
"""
passPython3
ReadyLines: 1Characters: 0
Ready