The AI Interview - Master AI/ML Interviews

In this problem, you need to implement a single function that can perform three variants of gradient descent: Stochastic Gradient Descent (SGD), Batch Gradient Descent, and Mini-Batch Gradient Descent using Mean Squared Error (MSE) as the loss function. The function will take an additional parameter to specify which variant to use.

Requirements

Do not shuffle the data; process samples in their original order (index 0, 1, 2, ...)
For Batch GD: use all samples to compute a single gradient update per epoch
For Stochastic GD: iterate through each sample sequentially (i.e., process sample 0, then 1, then 2, etc.) — not randomly selected
For Mini-Batch GD: form batches from consecutive samples without overlap (e.g., for batch_size=2: first batch uses indices [0,1], second batch uses [2,3], etc.)
The n_epochs parameter specifies how many complete passes through the dataset to perform
For each epoch, process all samples according to the specified method

Examples

Example 1:

Input:

import numpy as np

# Sample data
X = np.array([[1, 1], [2, 1], [3, 1], [4, 1]])
y = np.array([2, 3, 4, 5])

# Parameters
learning_rate = 0.01
n_epochs = 1000
batch_size = 2

# Initialize weights
weights = np.zeros(X.shape[1])

# Test Batch Gradient Descent
final_weights = gradient_descent(X, y, weights, learning_rate, n_epochs, method='batch')
# Test Stochastic Gradient Descent
final_weights = gradient_descent(X, y, weights, learning_rate, n_epochs, method='stochastic')
# Test Mini-Batch Gradient Descent
final_weights = gradient_descent(X, y, weights, learning_rate, n_epochs, batch_size, method='mini_batch')

Output:

[float, float]
[float, float]
[float, float]

Explanation: The function should return the final weights after performing the specified variant of gradient descent for the given number of epochs (complete passes through the data).

Starter Code

import numpy as np

def gradient_descent(X, y, weights, learning_rate, n_epochs, batch_size=1, method='batch'):
    """
    Perform gradient descent optimization.
    
    Args:
        X: Feature matrix of shape (m, n)
        y: Target values of shape (m,)
        weights: Initial weights of shape (n,)
        learning_rate: Step size for gradient descent
        n_epochs: Number of complete passes through the dataset
        batch_size: Size of batches for mini-batch gradient descent (default: 1)
        method: Type of gradient descent ('batch', 'stochastic', or 'mini_batch')
    
    Returns:
        Optimized weights
    """
    # Your code here
    pass

Implement Gradient Descent Variants with MSE Loss

Requirements

Examples

Starter Code