Gambler's Problem: Value Iteration

Hard
Reinforcement Learning

A gambler has the chance to bet on a sequence of coin flips. If the coin lands heads, the gambler wins the amount staked; if tails, the gambler loses the stake. The goal is to reach 100, starting from a given capital ss (with 0<s<1000 < s < 100). The game ends when the gambler reaches 00 (bankruptcy) or 100100 (goal). On each flip, the gambler can bet any integer amount from 11 up to min(s,100s)\min(s, 100-s).

The probability of heads is php_h (known). Reward is +1+1 if the gambler reaches 100100 in a transition, 00 otherwise.

Your Task: Write a function gambler_value_iteration(ph, theta=1e-9) that:

  • Computes the optimal state-value function V(s)V(s) for all s=1,...,99s = 1, ..., 99 using value iteration.
  • Returns the optimal policy as a mapping from state ss to the optimal stake aa^* (can return any optimal stake if there are ties).

Inputs:

  • ph: probability of heads (float between 0 and 1)
  • theta: threshold for value iteration convergence (default 1e91e-9)

Returns:

  • V: array/list of length 101, V[s]V[s] is the value for state ss
  • policy: array/list of length 101, policy[s]policy[s] is the optimal stake in state ss (0 if s=0s=0 or s=100s=100)

Examples

Example 1:
Input: ph = 0.4 V, policy = gambler_value_iteration(ph) print(round(V[50], 4)) print(policy[50])
Output: 0.0178 1
Explanation: From state 50, the optimal action is to bet 1, with a probability of reaching 100 of about 0.0178 when ph=0.4.

Starter Code

def gambler_value_iteration(ph, theta=1e-9):
    """
    Computes the optimal value function and policy for the Gambler's Problem.
    Args:
      ph: probability of heads
      theta: convergence threshold
    Returns:
      V: list of values for all states 0..100
      policy: list of optimal stakes for all states 0..100
    """
    # Your code here
    pass
Lines: 1Characters: 0
Ready
The AI Interview - Master AI/ML Interviews