The AI Interview - Master AI/ML Interviews

Problem

Implement a simple ETL (Extract-Transform-Load) pipeline for model-ready data preparation.

Given a CSV-like string containing user events with columns: user_id,event_type,value (header included), write a function run_etl(csv_text) that:

Extracts rows from the raw CSV text.
Transforms data by:
- Filtering only rows where event_type == "purchase".
- Converting value to float and dropping invalid rows.
- Aggregating total purchase value per user_id.
Loads the transformed results by returning a list of (user_id, total_value) tuples sorted by user_id ascending.

Assume small inputs (no external libs), handle extra whitespace, and ignore blank lines.

Examples

Example 1:

Input:

run_etl("user_id,event_type,value\n u1, purchase, 10.0\n u2, view, 1.0\n u1, purchase, 5\n u3, purchase, not_a_number\n u2, purchase, 3.5 \n\n")

Output: [('u1', 15.0), ('u2', 3.5)]

Explanation: Keep only purchases; convert values; drop invalid; aggregate per user; sort by user_id.

Starter Code

# Implement your function below.

def run_etl(csv_text: str) -> list[tuple[str, float]]:
	"""Run a simple ETL pipeline over CSV text with header user_id,event_type,value.

	Returns a sorted list of (user_id, total_value) for event_type == "purchase".
	"""
	# TODO: implement extract, transform, and load steps
	raise NotImplementedError

Build a Simple ETL Pipeline (MLOps)

Problem

Examples

Starter Code