Problem
Implement a simple ETL (Extract-Transform-Load) pipeline for model-ready data preparation.
Given a CSV-like string containing user events with columns: user_id,event_type,value (header included), write a function run_etl(csv_text) that:
- Extracts rows from the raw CSV text.
- Transforms data by:
- Filtering only rows where
event_type == "purchase". - Converting
valueto float and dropping invalid rows. - Aggregating total purchase
valueperuser_id.
- Filtering only rows where
- Loads the transformed results by returning a list of
(user_id, total_value)tuples sorted byuser_idascending.
Assume small inputs (no external libs), handle extra whitespace, and ignore blank lines.
Examples
Example 1:
Input:
run_etl("user_id,event_type,value\n u1, purchase, 10.0\n u2, view, 1.0\n u1, purchase, 5\n u3, purchase, not_a_number\n u2, purchase, 3.5 \n\n")Output:
[('u1', 15.0), ('u2', 3.5)]Explanation: Keep only purchases; convert values; drop invalid; aggregate per user; sort by user_id.
Starter Code
# Implement your function below.
def run_etl(csv_text: str) -> list[tuple[str, float]]:
"""Run a simple ETL pipeline over CSV text with header user_id,event_type,value.
Returns a sorted list of (user_id, total_value) for event_type == "purchase".
"""
# TODO: implement extract, transform, and load steps
raise NotImplementedErrorPython3
ReadyLines: 1Characters: 0
Ready