Epsilon-Greedy Action Selection for n-Armed Bandit

Medium
Reinforcement Learning

Implement the epsilon-greedy method for action selection in an n-armed bandit problem. Given a set of estimated action values (Q-values), select an action using the epsilon-greedy policy: with probability epsilon, choose a random action; with probability 1 - epsilon, choose the action with the highest estimated value.

Examples

Example 1:
Input: Q = np.array([0.5, 2.3, 1.7]) epsilon = 0.0 action = epsilon_greedy(Q, epsilon) print(action)
Output: 1
Explanation: With epsilon=0.0 (always greedy), the highest Q-value is 2.3 at index 1, so the function always returns 1.

Starter Code

import numpy as np

def epsilon_greedy(Q, epsilon=0.1):
    """
    Selects an action using epsilon-greedy policy.
    Q: np.ndarray of shape (n,) -- estimated action values
    epsilon: float in [0, 1]
    Returns: int, selected action index
    """
    # Your code here
    pass
Lines: 1Characters: 0
Ready
The AI Interview - Master AI/ML Interviews