Implement First-Visit Monte Carlo prediction for estimating state values. Monte Carlo methods learn directly from complete episodes of experience without bootstrapping. In first-visit MC, we estimate the value of a state as the average of returns following the first visit to that state in each episode. Your task is to process episodes and compute state value estimates.
Examples
Example 1:
Input:
episodes=[[(0,1), (1,1), (0,1), (2,0)]], gamma=1.0Output:
V[0]=3.0, V[1]=2.0, V[2]=0.0Explanation: State 0 is first visited at t=0. Return from t=0: 1+1+1+0=3. State 1 is first visited at t=1. Return from t=1: 1+1+0=2. State 2 first visited at t=3. Return: 0. State 0's second visit at t=2 is ignored.
Starter Code
import numpy as np
def first_visit_mc_prediction(
episodes: list[list[tuple[int, float]]],
n_states: int,
gamma: float
) -> np.ndarray:
"""
Estimate state values using First-Visit Monte Carlo prediction.
Args:
episodes: List of episodes. Each episode is a list of (state, reward) tuples.
The reward at index i is the reward received AFTER leaving state i.
n_states: Number of states (states are integers 0 to n_states-1)
gamma: Discount factor
Returns:
V: Estimated state values as numpy array of shape (n_states,)
"""
# Your code here
passPython3
ReadyLines: 1Characters: 0
Ready