Running dbt in production simply means setting up a system to run a dbt job - job - A series of dbt commands. For example, a production dbt job may involve running the commands `dbt seed`, `dbt run` and `dbt test`. on a schedule, rather than running dbt commands manually from the command line. These production dbt jobs should create the tables and views that your business intelligence tools and end users query. Before continuing, make sure you understand dbt's approach to managing environments.
dbt commands in production
We've written a guide for the dbt commands we run in production, over on Discourse.
As well as setting up a schedule, there are a number of other things you should consider when setting up dbt to run in production, such as:
- The complexity involved in creating a new dbt job, or editing an existing one.
- Setting up notifications if a step within your job returns an error code (e.g. a model cannot be built, or a test fails).
- Accessing logs to help debug any issues.
- Pulling the latest version of your git repo before running dbt (i.e. continuous deployment).
- Running your dbt project before merging code into master (i.e. continuous integration).
- Allowing access for team members that need to collaborate on your dbt project.
We've built dbt Cloud from the ground up to empower data teams to easily run dbt in production. With dbt Cloud, you can:
- run your jobs on a schedule
- view logs for any historical invocation of dbt
- configure error notifications
- render your project's documentation
If you're interested in giving dbt Cloud a spin, you can sign up for a forever free account here.
dbt Cloud in action
If your organization is already using Airflow, that could be a great way to kick off your dbt runs. There are a number of ways you can run your dbt jobs in Airflow, including:
- Using this dbt-cloud-plugin. This plugin gives you the best of both worlds -- deep integration of dbt into your existing data stack, along with all of the benefits of dbt Cloud.
- Invoking dbt through the BashOperator. In this case, be sure to install dbt into a virtual environment to avoid issues with conflicting dependencies between Airflow and dbt.
Automation servers, like CodeDeploy, GitLab CI/CD (video), Bamboo and Jenkins, can be used to schedule bash commands for dbt. They also provide a UI to view logging to the command line, and integrate with your git repository.
Cron is a decent way to schedule bash commands. However, while it may seem like an easy route to schedule a job, writing code to take care of all of the additional features associated with a production deployment often makes this route more complex compared to other options listed here.