GPT-2 Text Generation

Hard
NLP

Implement a Simplified GPT-2-like Text Generation Function

You are tasked with implementing a simplified GPT-2-like text generation function in Python. This function will incorporate the following components of a minimal GPT-2 architecture:

  • Token Embeddings: Map input tokens to dense vector representations.
  • Positional Embeddings: Add positional information to token embeddings.
  • Multi-head Attention: Attend to various parts of the sequence.
  • Feed-Forward Network: Process attention outputs through a dense layer.
  • Layer Normalization: Stabilize the training process.

The function must take in the following parameters:

  1. Prompt: The initial text to guide the generation process.
  2. Number of Tokens to Generate: Specify how many tokens to output.

Your function should output the generated text.

Additionally, utilize the helper function load_encoder_hparams_and_params to retrieve:

  • A dummy encoder.
  • Model hyperparameters.
  • Model parameters.

Build your text generation logic around these components. This exercise is designed to help you understand the core concepts behind GPT-2's autoregressive text generation.

Examples

Example 1:
Input: prompt="hello", n_tokens_to_generate=5
Output: hello hello hello <UNK> <UNK>
Explanation: The function encodes the input "hello" into tokens using the dummy encoder, then runs a simplified GPT-2 forward pass to generate 5 tokens. Finally, it decodes the generated tokens back into text.

Starter Code

def gen_text(prompt: str, n_tokens_to_generate: int = 40):
	# Your code here
	pass

def load_encoder_hparams_and_params(model_size: str = "124M", models_dir: str = "models"):
	class DummyBPE:
		def __init__(self):
			self.encoder_dict = {"hello": 1, "world": 2, "<UNK>": 0}

		def encode(self, text: str):
			tokens = text.strip().split()
			return [self.encoder_dict.get(token, self.encoder_dict["<UNK>"]) for token in tokens]

		def decode(self, token_ids: list):
			reversed_dict = {v: k for k, v in self.encoder_dict.items()}
			return " ".join([reversed_dict.get(tok_id, "<UNK>") for tok_id in token_ids])

	hparams = {
		"n_ctx": 1024,
		"n_head": 2
	}

	params = {
		"wte": np.random.rand(3, 10),
		"wpe": np.random.rand(1024, 10),
		"blocks": [],
		"ln_f": {
			"g": np.ones(10),
			"b": np.zeros(10),
		}
	}

	encoder = DummyBPE()
	return encoder, hparams, params
Lines: 1Characters: 0
Ready
The AI Interview - Master AI/ML Interviews