The AI Interview - Master AI/ML Interviews

Given an initial value $Q_1$ , a list of $k$ observed rewards $R_1, R_2, \ldots, R_k$ , and a step size $\alpha$ , implement a function to compute the exponentially weighted average as:

$(1-\alpha)^k Q_1 + \sum_{i=1}^k \alpha (1-\alpha)^{k-i} R_i$

This weighting gives more importance to recent rewards, while the influence of the initial estimate $Q_1$ decays over time. Do not use running/incremental updates; instead, compute directly from the formula. (This is called the exponential recency-weighted average.)

Examples

Example 1:

Input:

Q1 = 2.0
rewards = [5.0, 9.0]
alpha = 0.3
result = exp_weighted_average(Q1, rewards, alpha)
print(round(result, 4))

Output: 4.73

Explanation: With k=2, we compute: (1-0.3)^2 × 2.0 + 0.3×(1-0.3)^1 × 5.0 + 0.3×(1-0.3)^0 × 9.0 = 0.49×2.0 + 0.21×5.0 + 0.3×9.0 = 0.98 + 1.05 + 2.7 = 4.73

Starter Code

def exp_weighted_average(Q1, rewards, alpha):
    """
    Q1: float, initial estimate
    rewards: list or array of rewards, R_1 to R_k
    alpha: float, step size (0 < alpha <= 1)
    Returns: float, exponentially weighted average after k rewards
    """
    # Your code here
    pass

Exponential Weighted Average of Rewards

Examples

Starter Code