The AI Interview - Master AI/ML Interviews

Implement a function to calculate the Area Under the Receiver Operating Characteristic Curve (AUC-ROC) given binary ground truth labels and predicted probability scores.

The AUC-ROC metric measures how well a binary classifier distinguishes between positive and negative classes across all possible classification thresholds. An AUC of 1.0 represents a perfect classifier, 0.5 represents a random classifier, and 0.0 represents a classifier that always predicts incorrectly.

Your function should:

Take a list of binary labels (0 or 1) and corresponding prediction scores
Compute the True Positive Rate (TPR) and False Positive Rate (FPR) at each threshold
Calculate the area under the resulting ROC curve
Return the AUC value

Note: Handle edge cases where all labels are the same class by returning 0.0.

Examples

Example 1:

Input: y_true = [0, 0, 1, 1], y_scores = [0.1, 0.4, 0.35, 0.8]

Output: 0.75

Explanation: When sorted by scores in descending order: scores=[0.8, 0.4, 0.35, 0.1] and labels=[1, 0, 1, 0]. At each threshold, we compute TPR (true positives / total positives) and FPR (false positives / total negatives). The ROC curve traces points (0,0) -> (0, 0.5) -> (0.5, 0.5) -> (0.5, 1.0) -> (1.0, 1.0). Using trapezoidal integration, the area under this curve equals 0.75.

Starter Code

import numpy as np

def calculate_auc(y_true, y_scores):
    """
    Calculate the Area Under the ROC Curve (AUC).
    
    Args:
        y_true: List or array of binary ground truth labels (0 or 1)
        y_scores: List or array of predicted probabilities or confidence scores
        
    Returns:
        AUC value as a float
    """
    # Your code here
    pass

Calculate AUC (Area Under ROC Curve)

Examples

Starter Code