Implement GRU Cell

Medium
MLE Interview Prep

Problem

Implement a Gated Recurrent Unit (GRU) cell forward pass. The GRU is a type of recurrent neural network architecture that uses gating mechanisms to control the flow of information, helping to mitigate the vanishing gradient problem.

A GRU cell computes a new hidden state given an input vector and the previous hidden state using update and reset gates.

Input Parameters:

  • x: Input vector of shape (input_size,)
  • h_prev: Previous hidden state of shape (hidden_size,)
  • W_z, W_r, W_h: Weight matrices for input of shape (hidden_size, input_size)
  • U_z, U_r, U_h: Weight matrices for hidden state of shape (hidden_size, hidden_size)
  • b_z, b_r, b_h: Bias vectors of shape (hidden_size,)

Output:

  • h_next: New hidden state of shape (hidden_size,)

The GRU uses sigmoid and tanh activation functions. The update gate controls how much of the previous hidden state to retain, while the reset gate controls how much of the previous hidden state to forget when computing the candidate hidden state.

Examples

Example 1:
Input: x = [1.0, 0.5], h_prev = [0.0, 0.0, 0.0], all weight matrices filled with 0.1 or 0.2, all biases are zeros
Output: [0.1565, 0.1565, 0.1565]
Explanation: With h_prev = 0, the reset gate has no effect. The update gate z = sigmoid(0.15) = 0.5374 for each unit. The candidate h_tilde = tanh(0.3) = 0.2913 for each unit. The final hidden state h_next = z * h_tilde = 0.5374 * 0.2913 = 0.1565 for each unit.

Starter Code

import numpy as np

def gru_cell(x: np.ndarray, h_prev: np.ndarray,
             W_z: np.ndarray, U_z: np.ndarray, b_z: np.ndarray,
             W_r: np.ndarray, U_r: np.ndarray, b_r: np.ndarray,
             W_h: np.ndarray, U_h: np.ndarray, b_h: np.ndarray) -> np.ndarray:
    """
    Implements a single GRU cell forward pass.
    
    Args:
        x: Input vector of shape (input_size,)
        h_prev: Previous hidden state of shape (hidden_size,)
        W_z, W_r, W_h: Weight matrices for input
        U_z, U_r, U_h: Weight matrices for hidden state
        b_z, b_r, b_h: Bias vectors
    
    Returns:
        h_next: New hidden state of shape (hidden_size,)
    """
    # Your code here
    pass
Lines: 1Characters: 0
Ready
The AI Interview - Master AI/ML Interviews