Definition

Data Pipeline

Also called: data workflow.

A data pipeline is an automated sequence that moves data from a source through cleaning and transformation to a destination, usually on a schedule.

A data pipeline is an automated sequence that moves data from a source, through cleaning and transformation, to a destination — and keeps doing it on a schedule without someone running each step by hand.

What makes it a pipeline

The defining feature is that it runs itself. A one-off export-clean-report is the manual version of the same logic; a pipeline is that logic scheduled, monitored and repeatable. When a source updates, the pipeline picks up the change and the downstream report refreshes.

Pipeline vs a one-off

  • One-off: you run extract → transform → load by hand when you need it.
  • Pipeline: the same steps run on a cron, with refresh and delivery, unattended.

The free tools here let you run the steps once, by hand. Turning that into a pipeline — a source pulled on a schedule, a report that refreshes and delivers itself — is what graduating to a workspace does. See also ETL, the three steps a pipeline automates.

See the end of a pipeline — a report built from data, ready to schedule: Automated Client Reporting →

FAQ

Frequently asked questions

What is the difference between a data pipeline and ETL?
ETL names the steps (extract, transform, load). A data pipeline is the automated, scheduled system that runs those steps repeatedly, so the data and any reports stay current on their own.
Do small teams need a data pipeline?
If you produce the same report from the same kind of data on a regular cadence, a lightweight pipeline saves the most time — it removes the repetitive manual run, not the thinking.