Incremental Mean for Online Reward Estimation

Easy
Reinforcement Learning

Implement an efficient method to update the mean reward for a k-armed bandit action after receiving each new reward, without storing the full history of rewards. Given the previous mean estimate (Q_prev), the number of times the action has been selected (k), and a new reward (R), compute the updated mean using the incremental formula.

Note: Using a regular mean that stores all past rewards will eventually run out of memory. Your solution should use only the previous mean, the count, and the new reward.

Examples

Example 1:
Input: Q_prev = 2.0 k = 2 R = 6.0 new_Q = incremental_mean(Q_prev, k, R) print(round(new_Q, 2))
Output: 4.0
Explanation: The updated mean is Q_prev + (1/k) * (R - Q_prev) = 2.0 + (1/2)*(6.0 - 2.0) = 2.0 + 2.0 = 4.0

Starter Code

def incremental_mean(Q_prev, k, R):
    """
    Q_prev: previous mean estimate (float)
    k: number of times the action has been selected (int)
    R: new observed reward (float)
    Returns: new mean estimate (float)
    """
    # Your code here
    pass
Lines: 1Characters: 0
Ready
The AI Interview - Master AI/ML Interviews