dbt or Airflow: a working answer for analytics teams

I get asked some version of this question every month. Should we use dbt or Airflow. The framing is wrong. The question is not which tool, it is what each tool is good for, and where the boundary between them belongs. The teams that get this right move much faster than the teams that pick one and try to make it do everything.

What dbt is for

dbt is a tool for transforming data that is already in your warehouse, into data that is also in your warehouse, in a way that is testable and version-controlled. It is not a workflow engine. It is not a scheduler. It is a SQL-first model compilation system with a thoughtful test framework wrapped around it.

The right work for dbt is the part of your pipeline that is shaped like, given these tables in the warehouse, produce these other tables. Cleaning, standardising, joining, aggregating, producing the metrics layer, building the dimensional models. All of that work belongs in dbt. It belongs there because dbt turns it into testable units with clear lineage, which is what analytics work needs.

What Airflow is for

Airflow, or any modern equivalent, is a workflow engine. It is good at orchestrating heterogeneous tasks, handling external dependencies, retrying with backoff, and managing the part of the pipeline that lives outside the warehouse.

The right work for Airflow is the part of your pipeline that is shaped like, fetch this from this API, drop it on this filesystem, kick off this dbt run, then move this output to this destination, and notify this team if any of those steps fail. Ingestion, file movement, third-party API calls, notifications, and orchestrating downstream consumers of warehouse data. All of that work belongs in Airflow.

Where teams go wrong

The two failure modes I see are symmetric.

Some teams try to do everything in dbt. They write Python models that fetch data from external APIs, treat dbt as their scheduler by chaining models together with run conditions, and end up with a system that is hard to monitor and impossible to debug when an external dependency fails. dbt is not designed for this and the team is fighting the tool.

Other teams try to do everything in Airflow. They write hand-rolled SQL transformation tasks, manage their own DAG of table dependencies, and reinvent half of what dbt provides out of the box. Their transformation logic is not tested, their lineage is not visible, and their analysts cannot contribute because the SQL is wrapped in Python operators.

The boundary that works

The boundary I have settled on is simple. Anything that is not a SQL transformation belongs in Airflow. Anything that is a SQL transformation belongs in dbt. Airflow runs a single dbt command per project, which dbt then expands into the right internal DAG. The orchestration responsibility lives at the Airflow layer. The transformation responsibility lives at the dbt layer. The tools do not overlap.

That boundary has held up across three different organisations and four different team sizes. It scales because it lets each team work on the part of the pipeline they own, with the abstractions that fit. The data engineers run Airflow. The analytics engineers run dbt. The interface between them is the warehouse, which is where their work meets, and where the contract between them lives.

What about the new entrants

There are now several tools that promise to collapse this distinction. Some of them are good. None of them are yet good enough to replace the dbt-and-Airflow pattern in a production-scale analytics function. I will revisit this in 2026 if the picture has changed. For now, the boring answer is the right one. Use both. Respect the boundary. Stop trying to make either tool do the other's job.