Implement the Bellman Equation for Value Iteration

Medium
Reinforcement Learning

Write a function that performs one step of value iteration for a given Markov Decision Process (MDP) using the Bellman equation. The function should update the state-value function V(s) for each state based on possible actions, transition probabilities, rewards, and the discount factor gamma. Only use NumPy.

Examples

Example 1:
Input: import numpy as np transitions = [ {0: [(1.0, 0, 0.0, False)], 1: [(1.0, 1, 1.0, False)]}, {0: [(1.0, 0, 0.0, False)], 1: [(1.0, 1, 1.0, True)]} ] V = np.array([0.0, 0.0]) gamma = 0.9 new_V = bellman_update(V, transitions, gamma) print(np.round(new_V, 2))
Output: [1. 1.]
Explanation: For state 0, the best action is to go to state 1 and get a reward of 1. For state 1, taking action 1 gives a reward of 1 and ends the episode, so its value is 1.

Starter Code

import numpy as np

def bellman_update(V, transitions, gamma):
    """
    Perform one step of value iteration using the Bellman equation.
    Args:
      V: np.ndarray, state values, shape (n_states,)
      transitions: list of dicts. transitions[s][a] is a list of (prob, next_state, reward, done)
      gamma: float, discount factor
    Returns:
      np.ndarray, updated state values
    """
    # TODO: Implement Bellman update
    pass
Lines: 1Characters: 0
Ready
The AI Interview - Master AI/ML Interviews