Implement a Simplified GPT-2-like Text Generation Function
You are tasked with implementing a simplified GPT-2-like text generation function in Python. This function will incorporate the following components of a minimal GPT-2 architecture:
- Token Embeddings: Map input tokens to dense vector representations.
- Positional Embeddings: Add positional information to token embeddings.
- Multi-head Attention: Attend to various parts of the sequence.
- Feed-Forward Network: Process attention outputs through a dense layer.
- Layer Normalization: Stabilize the training process.
The function must take in the following parameters:
- Prompt: The initial text to guide the generation process.
- Number of Tokens to Generate: Specify how many tokens to output.
Your function should output the generated text.
Additionally, utilize the helper function load_encoder_hparams_and_params to retrieve:
- A dummy encoder.
- Model hyperparameters.
- Model parameters.
Build your text generation logic around these components. This exercise is designed to help you understand the core concepts behind GPT-2's autoregressive text generation.
Examples
Example 1:
Input:
prompt="hello", n_tokens_to_generate=5Output:
hello hello hello <UNK> <UNK>Explanation: The function encodes the input "hello" into tokens using the dummy encoder, then runs a simplified GPT-2 forward pass to generate 5 tokens. Finally, it decodes the generated tokens back into text.
Starter Code
def gen_text(prompt: str, n_tokens_to_generate: int = 40):
# Your code here
pass
def load_encoder_hparams_and_params(model_size: str = "124M", models_dir: str = "models"):
class DummyBPE:
def __init__(self):
self.encoder_dict = {"hello": 1, "world": 2, "<UNK>": 0}
def encode(self, text: str):
tokens = text.strip().split()
return [self.encoder_dict.get(token, self.encoder_dict["<UNK>"]) for token in tokens]
def decode(self, token_ids: list):
reversed_dict = {v: k for k, v in self.encoder_dict.items()}
return " ".join([reversed_dict.get(tok_id, "<UNK>") for tok_id in token_ids])
hparams = {
"n_ctx": 1024,
"n_head": 2
}
params = {
"wte": np.random.rand(3, 10),
"wpe": np.random.rand(1024, 10),
"blocks": [],
"ln_f": {
"g": np.ones(10),
"b": np.zeros(10),
}
}
encoder = DummyBPE()
return encoder, hparams, paramsPython3
ReadyLines: 1Characters: 0
Ready