Monte Carlo Tree Search

Hard
Reinforcement Learning

Implement Monte Carlo Tree Search (MCTS) for a simple sequential game. The game: start at a number, players alternate adding either 1 or 2, and whoever reaches exactly the target value wins (going over means you lose). MCTS should explore possible moves through random simulations and return the best first action. The algorithm has four phases: Selection (pick promising nodes), Expansion (add new nodes), Simulation (random playout), and Backpropagation (update statistics).

Examples

Example 1:
Input: initial_state=1, max_value=2, iterations=1000, seed=42
Output: 2
Explanation: Starting at 1 with goal 2, the player can choose +1 (reaching 2, winning immediately) or +2 (reaching 3, going over and losing). However, MCTS considers this as a two-player game where the opponent also plays optimally. Through simulation, it discovers that the action leading to the best win rate is 2, likely due to how the adversarial game tree unfolds with alternating players.

Starter Code

import numpy as np
from typing import Optional

class MCTSNode:
	def __init__(self, state: int, parent: Optional['MCTSNode'] = None):
		self.state = state
		self.parent = parent
		self.children = {}
		self.visits = 0
		self.value = 0.0
	
	def is_leaf(self):
		return len(self.children) == 0
	
	def ucb1(self, c: float = 1.414) -> float:
		"""Upper Confidence Bound for Trees."""
		# Your code here
		pass

def mcts_search(initial_state: int, max_value: int, iterations: int, seed: int = 42) -> int:
	"""
	Monte Carlo Tree Search for sequential game.
	
	Args:
		initial_state: Starting value
		max_value: Target value to reach
		iterations: Number of MCTS iterations
		seed: Random seed
	
	Returns:
		Best action (1 or 2)
	"""
	# Your code here
	pass
Lines: 1Characters: 0
Ready
The AI Interview - Master AI/ML Interviews