Find the Best Gini-Based Split for a Binary Decision Tree

Medium
Data Science Interview Prep

Implement a function that scans every feature and threshold in a small data set, then returns the split that minimises the weighted Gini impurity. Your implementation should support binary class labels (0 or 1) and handle ties gracefully.

You will write one function:

find_best_split(X: np.ndarray, y: np.ndarray) -> tuple[int, float]
  • X is an n×dn\times d NumPy array of numeric features.
  • y is a length-nn NumPy array of 0/1 labels.
  • The function returns (best_feature_index, best_threshold) for the split with the lowest weighted Gini impurity.
  • If several splits share the same impurity, return the first that you encounter while scanning features and thresholds.

Examples

Example 1:
Input: import numpy as np X = np.array([[2.5],[3.5],[1.0],[4.0]]) y = np.array([0,1,0,1]) print(find_best_split(X, y))
Output: (0, 2.5)
Explanation: Splitting on feature 0 at threshold 2.5 yields two perfectly pure leaves, producing the minimum possible weighted Gini impurity.

Starter Code

import numpy as np
from typing import Tuple

def find_best_split(X: np.ndarray, y: np.ndarray) -> Tuple[int, float]:
    """Return the (feature_index, threshold) that minimises weighted Gini impurity."""
    # ✏️ TODO: implement
    pass
Lines: 1Characters: 0
Ready
The AI Interview - Master AI/ML Interviews