Implement Monte Carlo Tree Search (MCTS) for a simple sequential game. The game: start at a number, players alternate adding either 1 or 2, and whoever reaches exactly the target value wins (going over means you lose). MCTS should explore possible moves through random simulations and return the best first action. The algorithm has four phases: Selection (pick promising nodes), Expansion (add new nodes), Simulation (random playout), and Backpropagation (update statistics).
Examples
Example 1:
Input:
initial_state=1, max_value=2, iterations=1000, seed=42Output:
2Explanation: Starting at 1 with goal 2, the player can choose +1 (reaching 2, winning immediately) or +2 (reaching 3, going over and losing). However, MCTS considers this as a two-player game where the opponent also plays optimally. Through simulation, it discovers that the action leading to the best win rate is 2, likely due to how the adversarial game tree unfolds with alternating players.
Starter Code
import numpy as np
from typing import Optional
class MCTSNode:
def __init__(self, state: int, parent: Optional['MCTSNode'] = None):
self.state = state
self.parent = parent
self.children = {}
self.visits = 0
self.value = 0.0
def is_leaf(self):
return len(self.children) == 0
def ucb1(self, c: float = 1.414) -> float:
"""Upper Confidence Bound for Trees."""
# Your code here
pass
def mcts_search(initial_state: int, max_value: int, iterations: int, seed: int = 42) -> int:
"""
Monte Carlo Tree Search for sequential game.
Args:
initial_state: Starting value
max_value: Target value to reach
iterations: Number of MCTS iterations
seed: Random seed
Returns:
Best action (1 or 2)
"""
# Your code here
passPython3
ReadyLines: 1Characters: 0
Ready