Skip to main content

Idempotence in dbt

Idempotence is a foundational goal for every dbt project. It means your transformations should be safe to re-run and always produce the same result, regardless of how many times they've run before.

Why this matters early

If you're new to dbt, idempotence is worth understanding before you build incremental models. It's easy to accidentally write logic that works fine the first time but breaks on re-runs.

What idempotence means in dbt

A dbt model is idempotent if it produces the same results whether you run it once or a hundred times. The final state of the data should be identical regardless of how many times you've run the model.

This is because each model is a pure function of your code and the current state of your database, not of how many times it's run before. Given the same inputs, it always produces the same output.

This makes dbt transformations:

  • Safe to retry if a run fails partway through, re-running it won't corrupt your data.
  • Easy to backfill so you can re-run historical data without worrying about double-counting or drift.
  • Predictable in CI so the same code always produces the same data, which makes testing reliable.

Materializations and idempotence

Most dbt materializations are idempotent by default:

MaterializationHow it achieves idempotence
tableRebuilds the table from scratch on every run (create or replace)
viewReplaces the view definition on every run
incrementalRequires deliberate configuration.
materialized_viewCreates or replaces the materialized view definition on each run; the warehouse manages data refresh

Idempotence and incremental models

Incremental models only process new or changed rows, which means they depend on the existing state of the target table. If you're not careful, re-running an incremental model can produce duplicate rows or inconsistent results.

The most common pitfall is appending rows without deduplication or a reliable unique key. Compare these side by side:

Non-idempotent incremental model:

This example filters incrementally but doesn't define a unique_key, so rows at the boundary can be appended again on each subsequent run, which might result in duplicate rows.

-- ❌ Not idempotent: re-runs can duplicate rows 
{{ config(materialized='incremental') }}

select * from {{ source('events', 'raw_events') }}

{% if is_incremental() %}
where event_at >= (select max(event_at) from {{ this }})
{% endif %}

Idempotent incremental model:

This example adds a unique_key, so matching rows are updated or replaced instead of appended as duplicates.

-- ✅ Idempotent when event_id is unique
{{ config(
materialized='incremental',
unique_key='event_id'
) }}

select * from {{ source('events', 'raw_events') }}

{% if is_incremental() %}
where event_at >= (select max(event_at) from {{ this }})
{% endif %}

With unique_key set, dbt updates existing rows and inserts new rows instead of appending duplicates. Depending on your adapter and incremental strategy, dbt does this with merge or delete+insert.

You can also use microbatch incremental models for large time-series datasets. Microbatch models process data in batches based on an event_time column, and can be more resilient for very large incremental workloads.

Common risks

RiskWhy it breaks idempotenceFix
Using current_timestamp() in a modelProduces different values on every runUse a column from the source data as the timestamp
Appending without a unique_keyWithout a unique_key, most adapters/strategies append rows and can duplicate on re-runs.Add unique_key to your incremental config
Generating surrogate keys with random valuesDifferent runs produce different keys for the same rowUse deterministic hashing (for example, dbt_utils.generate_surrogate_key)
Hardcoding "today's date" in logicResults change based on when the model runs, not the dataFilter on source timestamps

Full-refresh as a safety net

When an incremental model gets into a bad state (for example, due to a schema change or logic bug), you can always run dbt run --full-refresh to drop and rebuild the table from scratch. This is the escape hatch that makes incremental models recoverable.

Think of --full-refresh as proof that your underlying logic is still idempotent. Even if the incremental path is optimized, the full result should always be reproducible.

These docs cover the dbt features most affected by idempotence:

Was this page helpful?

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

0
Loading