The AI Interview - Master AI/ML Interviews

Implement the epsilon-greedy method for action selection in an n-armed bandit problem. Given a set of estimated action values (Q-values), select an action using the epsilon-greedy policy: with probability epsilon, choose a random action; with probability 1 - epsilon, choose the action with the highest estimated value.

Examples

Example 1:

Input:

Q = np.array([0.5, 2.3, 1.7])
epsilon = 0.0
action = epsilon_greedy(Q, epsilon)
print(action)

Output: 1

Explanation: With epsilon=0.0 (always greedy), the highest Q-value is 2.3 at index 1, so the function always returns 1.

Starter Code

import numpy as np

def epsilon_greedy(Q, epsilon=0.1):
    """
    Selects an action using epsilon-greedy policy.
    Q: np.ndarray of shape (n,) -- estimated action values
    epsilon: float in [0, 1]
    Returns: int, selected action index
    """
    # Your code here
    pass

Epsilon-Greedy Action Selection for n-Armed Bandit

Examples

Starter Code