Given an initial value Q1, a list of k observed rewards R1,R2,…,Rk, and a step size α, implement a function to compute the exponentially weighted average as:
(1−α)kQ1+∑i=1kα(1−α)k−iRi
This weighting gives more importance to recent rewards, while the influence of the initial estimate Q1 decays over time. Do not use running/incremental updates; instead, compute directly from the formula. (This is called the exponential recency-weighted average.)
Examples
Example 1:
Input:
Q1 = 2.0
rewards = [5.0, 9.0]
alpha = 0.3
result = exp_weighted_average(Q1, rewards, alpha)
print(round(result, 4))Output:
4.73Explanation: With k=2, we compute: (1-0.3)^2 × 2.0 + 0.3×(1-0.3)^1 × 5.0 + 0.3×(1-0.3)^0 × 9.0 = 0.49×2.0 + 0.21×5.0 + 0.3×9.0 = 0.98 + 1.05 + 2.7 = 4.73
Starter Code
def exp_weighted_average(Q1, rewards, alpha):
"""
Q1: float, initial estimate
rewards: list or array of rewards, R_1 to R_k
alpha: float, step size (0 < alpha <= 1)
Returns: float, exponentially weighted average after k rewards
"""
# Your code here
pass
Python3
ReadyLines: 1Characters: 0
Ready