Add snapshots to your DAG

What are snapshots?

Analysts often need to "look back in time" at previous data states in their mutable tables. While some source data systems are built in a way that makes accessing historical data possible, this is not always the case. dbt provides a mechanism, snapshots, which records changes to a mutable table over time.

Snapshots implement type-2 Slowly Changing Dimensions over mutable source tables. These Slowly Changing Dimensions (or SCDs) identify how a row in a table changes over time. Imagine you have an orders table where the status field can be overwritten as the order is processed.

id	status	updated_at
1	pending	2024-01-01

Now, imagine that the order goes from "pending" to "shipped". That same record will now look like:

id	status	updated_at
1	shipped	2024-01-02

This order is now in the "shipped" state, but we've lost the information about when the order was last in the "pending" state. This makes it difficult (or impossible) to analyze how long it took for an order to ship. dbt can "snapshot" these changes to help you understand how values in a row change over time. Here's an example of a snapshot table for the previous example:

id	status	updated_at	dbt_valid_from	dbt_valid_to
1	pending	2024-01-01	2024-01-01	2024-01-02
1	shipped	2024-01-02	2024-01-02	`null`

Configuring snapshots

Configuration best practices

Use the timestamp strategy where possible

This strategy handles column additions and deletions better than the check strategy.

Use dbt_valid_to_current for easier date range queries

By default, dbt_valid_to is NULL for current records. However, if you set the dbt_valid_to_current configuration (available in dbt Core v1.9+), dbt_valid_to will be set to your specified value (such as 9999-12-31) for current records.

This allows for straightforward date range filtering.

Ensure your unique key is really unique

The unique key is used by dbt to match rows up, so it's extremely important to make sure this key is actually unique! If you're snapshotting a source, I'd recommend adding a uniqueness test to your source (example).

How snapshots work

When you run the dbt snapshot command:

On the first run: dbt will create the initial snapshot table — this will be the result set of your select statement, with additional columns including dbt_valid_from and dbt_valid_to. All records will have a dbt_valid_to = null or the value specified in dbt_valid_to_current (available in dbt Core 1.9+) if configured.
On subsequent runs: dbt will check which records have changed or if any new records have been created:
- The dbt_valid_to column will be updated for any existing records that have changed.
- The updated record and any new records will be inserted into the snapshot table. These records will now have dbt_valid_to = null or the value configured in dbt_valid_to_current (available in dbt Core v1.9+).

Snapshots can be referenced in downstream models the same way as referencing models — by using the ref function.

Detecting row changes

Snapshot "strategies" define how dbt knows if a row has changed. There are two strategies built-in to dbt:

Timestamp — Uses an updated_at column to determine if a row has changed.
Check — Compares a list of columns between their current and historical values to determine if a row has changed.

Timestamp strategy (recommended)

The timestamp strategy uses an updated_at field to determine if a row has changed. If the configured updated_at column for a row is more recent than the last time the snapshot ran, then dbt will invalidate the old record and record the new one. If the timestamps are unchanged, then dbt will not take any action.

The timestamp strategy requires the following configurations:

Config	Description	Example
updated_at	A column which represents when the source row was last updated. May support ISO date strings and unix epoch integers, depending on the data platform you use.	`updated_at`

Example usage:

Check strategy

The check strategy is useful for tables which do not have a reliable updated_at column. This strategy works by comparing a list of columns between their current and historical values. If any of these columns have changed, then dbt will invalidate the old record and record the new one. If the column values are identical, then dbt will not take any action.

The check strategy requires the following configurations:

Config	Description	Example
check_cols	A list of columns to check for changes, or `all` to check all columns	`["name", "email"]`

check_cols = 'all'

The check snapshot strategy can be configured to track changes to all columns by supplying check_cols = 'all'. It is better to explicitly enumerate the columns that you want to check. Consider using a surrogate key to condense many columns into a single column.

Example usage

Example usage with `updated_at`

When using the check strategy, dbt tracks changes by comparing values in check_cols. By default, dbt uses the timestamp to update dbt_updated_at, dbt_valid_from and dbt_valid_to fields. Optionally you can set an updated_at column:

If updated_at is configured, the check strategy uses this column instead, as with the timestamp strategy.
If updated_at value is null, dbt defaults to using the current timestamp.

Check out the following example, which shows how to use the check strategy with updated_at:

snapshots:
  - name: orders_snapshot
    relation: ref('stg_orders')
    config:
      schema: snapshots
      unique_key: order_id
      strategy: check
      check_cols:
        - status
        - is_cancelled
      updated_at: updated_at

In this example:

If at least one of the specified check_cols changes, the snapshot creates a new row. If the updated_at column has a value (is not null), the snapshot uses it; otherwise, it defaults to the timestamp.
If updated_at isn’t set, then dbt automatically falls back to using the current timestamp to track changes.
Use this approach when your updated_at column isn't reliable for tracking record updates, but you still want to use it — rather than the snapshot's execution time — whenever row changes are detected.

Hard deletes (opt-in)

Snapshot meta-fields

Snapshot tables will be created as a clone of your source dataset, plus some additional meta-fields*.

In dbt Core v1.9+ (or available sooner in the "Latest" release track in dbt Cloud):

These column names can be customized to your team or organizational conventions using the snapshot_meta_column_names config.
Use the dbt_valid_to_current config to set a custom indicator for the value of dbt_valid_to in current snapshot records (like a future date such as 9999-12-31). By default, this value is NULL. When set, dbt will use this specified value instead of NULL for dbt_valid_to for current records in the snapshot table.
Use the hard_deletes config to track deleted records as new rows with the dbt_is_deleted meta field when using the hard_deletes='new_record' field.

Field	Meaning	Notes	Example
`dbt_valid_from`	The timestamp when this snapshot row was first inserted and became valid.	This column can be used to order the different "versions" of a record.	`snapshot_meta_column_names: {dbt_valid_from: start_date}`
`dbt_valid_to`	The timestamp when this row became invalidated. For current records, this is `NULL` by default or the value specified in `dbt_valid_to_current`.	The most recent snapshot record will have `dbt_valid_to` set to `NULL` or the specified value.	`snapshot_meta_column_names: {dbt_valid_to: end_date}`
`dbt_scd_id`	A unique key generated for each snapshot row.	This is used internally by dbt.	`snapshot_meta_column_names: {dbt_scd_id: scd_id}`
`dbt_updated_at`	The `updated_at` timestamp of the source record when this snapshot row was inserted.	This is used internally by dbt.	`snapshot_meta_column_names: {dbt_updated_at: modified_date}`
`dbt_is_deleted`	A string value indicating if the record has been deleted. (`True` if deleted, `False` if not deleted).	Added when `hard_deletes='new_record'` is configured.	`snapshot_meta_column_names: {dbt_is_deleted: is_deleted}`

All of these column names can be customized using the snapshot_meta_column_names config. Refer to this example for more details.

*The timestamps used for each column are subtly different depending on the strategy you use:

For the timestamp strategy, the configured updated_at column is used to populate the dbt_valid_from, dbt_valid_to and dbt_updated_at columns.

Sample results for the timestamp strategy

Snapshot query results at 2024-01-01 11:00

id	status	updated_at
1	pending	2024-01-01 10:47

Snapshot results (note that 11:00 is not used anywhere):

id	status	updated_at	dbt_valid_from	dbt_valid_to	dbt_updated_at
1	pending	2024-01-01 10:47	2024-01-01 10:47		2024-01-01 10:47

Query results at 2024-01-01 11:30:

id	status	updated_at
1	shipped	2024-01-01 11:05

Snapshot results (note that 11:30 is not used anywhere):

id	status	updated_at	dbt_valid_from	dbt_valid_to	dbt_updated_at
1	pending	2024-01-01 10:47	2024-01-01 10:47	2024-01-01 11:05	2024-01-01 10:47
1	shipped	2024-01-01 11:05	2024-01-01 11:05		2024-01-01 11:05

Snapshot results with hard_deletes='new_record':

id	status	updated_at	dbt_valid_from	dbt_valid_to	dbt_updated_at	dbt_is_deleted
1	pending	2024-01-01 10:47	2024-01-01 10:47	2024-01-01 11:05	2024-01-01 10:47	False
1	shipped	2024-01-01 11:05	2024-01-01 11:05	2024-01-01 11:20	2024-01-01 11:05	False
1	deleted	2024-01-01 11:20	2024-01-01 11:20		2024-01-01 11:20	True

For the check strategy, the current timestamp is used to populate each column. If configured, the check strategy uses the updated_at column instead, as with the timestamp strategy.

Sample results for the check strategy

Snapshot query results at 2024-01-01 11:00

id	status
1	pending

Snapshot results:

id	status	dbt_valid_from	dbt_valid_to	dbt_updated_at
1	pending	2024-01-01 11:00		2024-01-01 11:00

Query results at 2024-01-01 11:30:

id	status
1	shipped

Snapshot results:

id	status	dbt_valid_from	dbt_valid_to	dbt_updated_at
1	pending	2024-01-01 11:00	2024-01-01 11:30	2024-01-01 11:00
1	shipped	2024-01-01 11:30		2024-01-01 11:30

Snapshot results with hard_deletes='new_record':

id	status	dbt_valid_from	dbt_valid_to	dbt_updated_at	dbt_is_deleted
1	pending	2024-01-01 11:00	2024-01-01 11:30	2024-01-01 11:00	False
1	shipped	2024-01-01 11:30	2024-01-01 11:40	2024-01-01 11:30	False
1	deleted	2024-01-01 11:40		2024-01-01 11:40	True

FAQs

How do I run one snapshot at a time?

How often should I run the snapshot command?

What happens if I add new columns to my snapshot query?

Do hooks run with snapshots?

Can I store my snapshots in a directory other than the `snapshot` directory in my project?

Debug Snapshot target is not a snapshot table errors

Related documentation​

What are snapshots?​

Configuring snapshots​

Configuration best practices​

How snapshots work​

Detecting row changes​

Timestamp strategy (recommended)​

Check strategy​

Example usage​

Example usage with updated_at​

Hard deletes (opt-in)​

Snapshot meta-fields​

FAQs​

Related documentation