The AI Interview - Master AI/ML Interviews

Write a function that computes the discounted return $G_t = \sum_{k=0}^\infty \gamma^k R_{t+k+1}$ for a given sequence of rewards and discount factor gamma. This quantity corresponds to the expected return $v_\pi(s)$ in reinforcement learning, as defined by the equation in the image. Only use NumPy.

Examples

Example 1:

Input:

rewards = [1, 2, 3, 4]
gamma = 0.9
print(discounted_return(rewards, gamma))

Output: 8.146

Explanation: G = 1 + 0.9*2 + 0.9^2*3 + 0.9^3*4 = 1 + 1.8 + 2.43 + 2.916 = 8.146

Starter Code

import numpy as np

def discounted_return(rewards, gamma):
    """
    Compute the discounted return for a given list of rewards.
    Args:
      rewards (list of float): sequence of rewards R_{t+1}, R_{t+2}, ...
      gamma (float): discount factor (0 <= gamma <= 1)
    Returns:
      float: discounted return G_t
    """
    # Your code here
    pass

Calculate the Discounted Return for a Given Trajectory

Examples

Starter Code