The AI Interview - Master AI/ML Interviews

Generator-Critic Self-Play

Self-play iteratively improves outputs by alternating generation and critique.

Task

Build SelfPlayAgent where:

Generator produces an attempt (given task + prior critique).
Critic scores 0-1 and provides critique text.
Loop repeats until score >= threshold or max_rounds reached.
Returns best attempt (highest score across all rounds).

Constraints

Stop early on threshold met.
Best attempt = highest scored, not necessarily last.
Critique is always fed back to generator as context.

Examples

Example 1:

Input:

scores=[0.5,0.8,0.95]; idx=[0]
def gen(t,p='',c=''): return f'v{idx[0]}'
def crit(t,a): s=scores[idx[0]]; idx[0]+=1; return ('ok',s)
agent=SelfPlayAgent(gen,crit,max_rounds=10)
agent.run('task',threshold=0.9)

Output: {'best_attempt':'v2','best_score':0.95,'rounds_taken':3}

Explanation: Score exceeds 0.9 at round 3; early stop.

Starter Code

from typing import List, Dict, Callable, Tuple
from dataclasses import dataclass

@dataclass
class Round:
    round_num: int
    attempt: str
    critique: str
    score: float

class SelfPlayAgent:
    def __init__(self, generator: Callable, critic: Callable, max_rounds: int = 5):
        # generator(task, previous_attempt, critique) -> str
        # critic(task, attempt) -> Tuple[str, float]  (critique, score 0-1)
        self.generator = generator
        self.critic = critic
        self.max_rounds = max_rounds
        self.rounds: List[Round] = []

    def run(self, task: str, threshold: float = 0.9) -> Dict:
        # TODO: Run loop, stop early if score >= threshold
        # Return {'best_attempt': str, 'best_score': float, 'rounds_taken': int}
        pass

Generator-Critic Self-Play Loop

Generator-Critic Self-Play

Task

Constraints

Examples

Starter Code