Skip to main content

Airflow and dbt Cloud

In some cases, Airflow may be the preferred orchestrator for your organization over working fully within dbt Cloud. There are a few reasons your team might be considering using Airflow to orchestrate your dbt jobs:

  • Your team is already using Airflow to orchestrate other processes
  • Your team needs to ensure that a dbt job kicks off before or after another process outside of dbt Cloud
  • Your team needs flexibility to manage more complex scheduling, such as kicking off one dbt job only after another has completed
  • Your team wants to own their own orchestration solution
  • You need code to work right now without starting from scratch

How are people using Airflow + dbt today?

Airflow + dbt Core

There are so many great examples from Gitlab through their open source data engineering work. Example: here. This is especially appropriate if you are well-versed in Kubernetes, CI/CD, and docker task management when building your airflow pipelines. If this is you and your team, you’re in good hands reading through more details: here and here

Airflow + dbt Cloud API w/Custom Scripts

This has served as a bridge until the fabled Astronomer + dbt Labs-built dbt Cloud provider became generally available: here

There are many different permutations of this over time:

This guide's process

These solutions are great, but can be difficult to trust as your team grows and management for things like: testing, job definitions, secrets, and pipelines increase past your team’s capacity. Roles become blurry (or were never clearly defined at the start!). Both data and analytics engineers start digging through custom logging within each other’s workflows to make heads or tails of where and what the issue really is. Not to mention that when the issue is found, it can be even harder to decide on the best path forward for safely implementing fixes. This complex workflow and unclear delineation on process management results in a lot of misunderstandings and wasted time just trying to get the process to work smoothly!

A better way

After today’s walkthrough, you’ll get hands-on experience:

  1. Creating a working local Airflow environment
  2. Invoking a dbt Cloud job with Airflow (with proof!)
  3. Reusing tested and trusted Airflow code for your specific use cases

While you’re learning the ropes, you’ll also gain a better understanding of how this helps to:

  • Reduce the cognitive load when building and maintaining pipelines
  • Avoid dependency hell (think: pip install conflicts)
  • Implement better recoveries from failures
  • Define clearer workflows so that data and analytics engineers work better, together ♥️

Prerequisites

🙌 Let’s get started! 🙌