The AI Interview - Master AI/ML Interviews

Implement advanced checkpoint-based recovery:

Checkpointing:

checkpoint(agent_id, state, step_number): Save state periodically
- Only save if step_number % interval == 0
- Serialize state (use pickle or json)
- Return checkpoint_id
restore(agent_id, checkpoint_id): Load state
replay_from_checkpoint(agent_id, checkpoint_id, actions): Replay
prune_old_checkpoints(agent_id, keep_last): Cleanup
get_recovery_point(agent_id, target_step): Find nearest checkpoint

Checkpoint Format: {'checkpoint_id': ..., 'step': ..., 'timestamp': ..., 'state': ...}

Recovery Strategy:

Find nearest checkpoint before failure
Restore state
Replay actions from checkpoint to failure point

Examples

Example 1:

Input: cr = CheckpointRecovery(checkpoint_interval=5); cid = cr.checkpoint('agent1', {'data': 'state'}, 5); cid is not None

Output: True

Explanation: Checkpoint saved at step 5 (multiple of interval)

Starter Code

import pickle
import time

class CheckpointRecovery:
    """
    Advanced failure recovery with state checkpointing and replay.
    """
    
    def __init__(self, checkpoint_interval=5):
        self.checkpoint_interval = checkpoint_interval
        self.checkpoints = {}  # agent_id -> list of checkpoints
        self.current_states = {}
    
    def checkpoint(self, agent_id, state, step_number):
        """
        Save checkpoint if step_number % interval == 0
        Returns checkpoint_id or None
        """
        # Your implementation here
        pass
    
    def restore(self, agent_id, checkpoint_id=None):
        """
        Restore agent to checkpoint.
        If checkpoint_id is None, restore to latest.
        """
        # Your implementation here
        pass
    
    def replay_from_checkpoint(self, agent_id, checkpoint_id, actions):
        """
        Replay actions from checkpoint to recover state.
        """
        # Your implementation here
        pass
    
    def prune_old_checkpoints(self, agent_id, keep_last=3):
        """Keep only last N checkpoints to save space"""
        # Your implementation here
        pass
    
    def get_recovery_point(self, agent_id, target_step):
        """Find nearest checkpoint before target_step"""
        # Your implementation here
        pass

Advanced Failure Recovery with Checkpointing

Examples

Starter Code