Write a function that computes the discounted return Gt=∑k=0∞γkRt+k+1 for a given sequence of rewards and discount factor gamma. This quantity corresponds to the expected return vπ(s) in reinforcement learning, as defined by the equation in the image. Only use NumPy.
Examples
Example 1:
Input:
rewards = [1, 2, 3, 4]
gamma = 0.9
print(discounted_return(rewards, gamma))Output:
8.146Explanation: G = 1 + 0.9*2 + 0.9^2*3 + 0.9^3*4 = 1 + 1.8 + 2.43 + 2.916 = 8.146
Starter Code
import numpy as np
def discounted_return(rewards, gamma):
"""
Compute the discounted return for a given list of rewards.
Args:
rewards (list of float): sequence of rewards R_{t+1}, R_{t+2}, ...
gamma (float): discount factor (0 <= gamma <= 1)
Returns:
float: discounted return G_t
"""
# Your code here
passPython3
ReadyLines: 1Characters: 0
Ready