Calculate the Discounted Return for a Given Trajectory

Easy
Reinforcement Learning

Write a function that computes the discounted return Gt=k=0γkRt+k+1G_t = \sum_{k=0}^\infty \gamma^k R_{t+k+1} for a given sequence of rewards and discount factor gamma. This quantity corresponds to the expected return vπ(s)v_\pi(s) in reinforcement learning, as defined by the equation in the image. Only use NumPy.

Examples

Example 1:
Input: rewards = [1, 2, 3, 4] gamma = 0.9 print(discounted_return(rewards, gamma))
Output: 8.146
Explanation: G = 1 + 0.9*2 + 0.9^2*3 + 0.9^3*4 = 1 + 1.8 + 2.43 + 2.916 = 8.146

Starter Code

import numpy as np

def discounted_return(rewards, gamma):
    """
    Compute the discounted return for a given list of rewards.
    Args:
      rewards (list of float): sequence of rewards R_{t+1}, R_{t+2}, ...
      gamma (float): discount factor (0 <= gamma <= 1)
    Returns:
      float: discounted return G_t
    """
    # Your code here
    pass
Lines: 1Characters: 0
Ready
The AI Interview - Master AI/ML Interviews