The AI Interview - Master AI/ML Interviews

Implement an efficient method to update the mean reward for a k-armed bandit action after receiving each new reward, without storing the full history of rewards. Given the previous mean estimate (Q_prev), the number of times the action has been selected (k), and a new reward (R), compute the updated mean using the incremental formula.

Note: Using a regular mean that stores all past rewards will eventually run out of memory. Your solution should use only the previous mean, the count, and the new reward.

Examples

Example 1:

Input:

Q_prev = 2.0
k = 2
R = 6.0
new_Q = incremental_mean(Q_prev, k, R)
print(round(new_Q, 2))

Output: 4.0

Explanation: The updated mean is Q_prev + (1/k) * (R - Q_prev) = 2.0 + (1/2)*(6.0 - 2.0) = 2.0 + 2.0 = 4.0

Starter Code

def incremental_mean(Q_prev, k, R):
    """
    Q_prev: previous mean estimate (float)
    k: number of times the action has been selected (int)
    R: new observed reward (float)
    Returns: new mean estimate (float)
    """
    # Your code here
    pass

Incremental Mean for Online Reward Estimation

Examples

Starter Code