A gambler has the chance to bet on a sequence of coin flips. If the coin lands heads, the gambler wins the amount staked; if tails, the gambler loses the stake. The goal is to reach 100, starting from a given capital s (with 0<s<100). The game ends when the gambler reaches 0 (bankruptcy) or 100 (goal). On each flip, the gambler can bet any integer amount from 1 up to min(s,100−s).
The probability of heads is ph (known). Reward is +1 if the gambler reaches 100 in a transition, 0 otherwise.
Your Task:
Write a function gambler_value_iteration(ph, theta=1e-9) that:
- Computes the optimal state-value function V(s) for all s=1,...,99 using value iteration.
- Returns the optimal policy as a mapping from state s to the optimal stake a∗ (can return any optimal stake if there are ties).
Inputs:
ph: probability of heads (float between 0 and 1)theta: threshold for value iteration convergence (default 1e−9)
Returns:
V: array/list of length 101, V[s] is the value for state spolicy: array/list of length 101, policy[s] is the optimal stake in state s (0 if s=0 or s=100)
Examples
Example 1:
Input:
ph = 0.4
V, policy = gambler_value_iteration(ph)
print(round(V[50], 4))
print(policy[50])Output:
0.0178
1Explanation: From state 50, the optimal action is to bet 1, with a probability of reaching 100 of about 0.0178 when ph=0.4.
Starter Code
def gambler_value_iteration(ph, theta=1e-9):
"""
Computes the optimal value function and policy for the Gambler's Problem.
Args:
ph: probability of heads
theta: convergence threshold
Returns:
V: list of values for all states 0..100
policy: list of optimal stakes for all states 0..100
"""
# Your code here
passPython3
ReadyLines: 1Characters: 0
Ready