First-Visit Monte Carlo Prediction

Medium
Reinforcement Learning

Implement First-Visit Monte Carlo prediction for estimating state values. Monte Carlo methods learn directly from complete episodes of experience without bootstrapping. In first-visit MC, we estimate the value of a state as the average of returns following the first visit to that state in each episode. Your task is to process episodes and compute state value estimates.

Examples

Example 1:
Input: episodes=[[(0,1), (1,1), (0,1), (2,0)]], gamma=1.0
Output: V[0]=3.0, V[1]=2.0, V[2]=0.0
Explanation: State 0 is first visited at t=0. Return from t=0: 1+1+1+0=3. State 1 is first visited at t=1. Return from t=1: 1+1+0=2. State 2 first visited at t=3. Return: 0. State 0's second visit at t=2 is ignored.

Starter Code

import numpy as np

def first_visit_mc_prediction(
    episodes: list[list[tuple[int, float]]],
    n_states: int,
    gamma: float
) -> np.ndarray:
    """
    Estimate state values using First-Visit Monte Carlo prediction.
    
    Args:
        episodes: List of episodes. Each episode is a list of (state, reward) tuples.
                 The reward at index i is the reward received AFTER leaving state i.
        n_states: Number of states (states are integers 0 to n_states-1)
        gamma: Discount factor
        
    Returns:
        V: Estimated state values as numpy array of shape (n_states,)
    """
    # Your code here
    pass
Lines: 1Characters: 0
Ready
The AI Interview - Master AI/ML Interviews