Build a Simple ETL Pipeline (MLOps)

Medium
MLOps

Problem

Implement a simple ETL (Extract-Transform-Load) pipeline for model-ready data preparation.

Given a CSV-like string containing user events with columns: user_id,event_type,value (header included), write a function run_etl(csv_text) that:

  1. Extracts rows from the raw CSV text.
  2. Transforms data by:
    • Filtering only rows where event_type == "purchase".
    • Converting value to float and dropping invalid rows.
    • Aggregating total purchase value per user_id.
  3. Loads the transformed results by returning a list of (user_id, total_value) tuples sorted by user_id ascending.

Assume small inputs (no external libs), handle extra whitespace, and ignore blank lines.

Examples

Example 1:
Input: run_etl("user_id,event_type,value\n u1, purchase, 10.0\n u2, view, 1.0\n u1, purchase, 5\n u3, purchase, not_a_number\n u2, purchase, 3.5 \n\n")
Output: [('u1', 15.0), ('u2', 3.5)]
Explanation: Keep only purchases; convert values; drop invalid; aggregate per user; sort by user_id.

Starter Code

# Implement your function below.

def run_etl(csv_text: str) -> list[tuple[str, float]]:
	"""Run a simple ETL pipeline over CSV text with header user_id,event_type,value.

	Returns a sorted list of (user_id, total_value) for event_type == "purchase".
	"""
	# TODO: implement extract, transform, and load steps
	raise NotImplementedError
Lines: 1Characters: 0
Ready
The AI Interview - Master AI/ML Interviews