The AI Interview - Master AI/ML Interviews

Write a function to compute the discounted return for a sequence of rewards given a discount factor gamma. The function should take a list or NumPy array of rewards and a discount factor gamma (0 < gamma <= 1) and return the scalar value of the total discounted return. Only use NumPy.

Examples

Example 1:

Input:

rewards = [1, 1, 1]
gamma = 0.5
print(discounted_return(rewards, gamma))

Output: 1.75

Explanation: Discounted return: 1*1 + 1*0.5 + 1*0.25 = 1 + 0.5 + 0.25 = 1.75

Starter Code

import numpy as np

def discounted_return(rewards, gamma):
    """
    Compute the total discounted return for a sequence of rewards.
    Args:
        rewards (list or np.ndarray): List or array of rewards [r_0, r_1, ..., r_T-1]
        gamma (float): Discount factor (0 < gamma <= 1)
    Returns:
        float: Total discounted return
    """
    # Your code here
    pass

Compute Discounted Return

Examples

Starter Code