Skip to main content

Parallel microbatch execution

Use parallel batch execution to process your microbatch models faster.

The microbatch strategy offers the benefit of updating a model in smaller, more manageable batches. Depending on your use case, configuring your microbatch models to run in parallel offers faster processing, in comparison to running batches sequentially.

Parallel batch execution means that multiple batches are processed at the same time, instead of one after the other (sequentially) for faster processing of your microbatch models.

dbt automatically detects whether a batch can be run in parallel in most cases, which means you don’t need to configure this setting. However, the concurrent_batches config is available as an override (not a gate), allowing you to specify whether batches should or shouldn’t be run in parallel in specific cases.

For example, if you have a microbatch model with 12 batches, you can execute those batches to run in parallel. Specifically they'll run in parallel limited by the number of available threads.

Prerequisites

To enable parallel execution, you must:

  • Use Snowflake as a supported adapter.
    • More adapters coming soon!
    • We'll be continuing to test and add concurrency support for adapters. This means that some adapters might get concurrency support after the 1.9 release.
  • Meet additional conditions described in the following section.

How parallel batch execution works

A batch can only run in parallel if all of these conditions are met:

  • Batch is not the first batch.
  • Batch is not the last batch.
  • Adapter supports parallel batches.

After checking for the conditions in the previous table — and if concurrent_batches value isn't set, dbt will intelligently auto-detect if the model invokes the {{ this }} Jinja function. If it references {{ this }}, the batches will run sequentially since {{ this }} represents the database of the current model and referencing the same relation causes conflict.

Otherwise, if {{ this }} isn't detected (and other conditions are met), the batches will run in parallel, which can be overriden when you set a value for concurrent_batches.

Parallel or sequential execution

Choosing between parallel batch execution and sequential processing depends on the specific requirements of your use case.

  • Parallel batch execution is faster but requires logic independent of batch execution order. For example, if you're developing a data pipeline for a system that processes user transactions in batches, each batch is executed in parallel for better performance. However, the logic used to process each transaction shouldn't depend on the order of how batches are executed or completed.
  • Sequential processing is slower but essential for calculations like cumulative metrics in microbatch models. It processes data in the correct order, allowing each step to build on the previous one.

Configure concurrent_batches

By default, dbt auto-detects whether batches can run in parallel for microbatch models, and this works correctly in most cases. However, you can override dbt's detection by setting the concurrent_batches config in your dbt_project.yml or model .sql file to specify parallel or sequential execution, given you meet all the conditions:

dbt_project.yml
models:
+concurrent_batches: true # value set to true to run batches in parallel
0