Skip to main content

freshness

models/<filename>.yml

version: 2

sources:
- name: <source_name>
freshness:
warn_after:
count: <positive_integer>
period: minute | hour | day
error_after:
count: <positive_integer>
period: minute | hour | day
filter: <boolean_sql_expression>
loaded_at_field: <column_name_or_expression>

tables:
- name: <table_name>
freshness:
warn_after:
count: <positive_integer>
period: minute | hour | day
error_after:
count: <positive_integer>
period: minute | hour | day
filter: <boolean_sql_expression>
loaded_at_field: <column_name_or_expression>
...

Definition

A freshness block is used to define the acceptable amount of time between the most recent record, and now, for a table to be considered "fresh".

In the freshness block, one or both of warn_after and error_after can be provided. If neither is provided, then dbt will not calculate freshness snapshots for the tables in this source.

A column name (or expression) that returns a timestamp indicating freshness.

If using a date field, you may have to cast it to a timestamp:

loaded_at_field: "completed_date::timestamp"

Or, depending on your SQL variant:

loaded_at_field: "CAST(completed_date AS TIMESTAMP)"

If using a non-UTC timestamp, cast it to UTC first:

loaded_at_field: "convert_timezone('Australia/Sydney', 'UTC', created_at_local)"

count

(Required)

A positive integer for the number of periods where a data source is still considered "fresh".

period

(Required)

The time period used in the freshness calculation. One of minute, hour or day

filter

(optional)

Add a where clause to the query run by dbt source freshness in order to limit data scanned.

This filter only applies to dbt's source freshness queries - it will not impact other uses of the source table.

This is particularly useful if:

  • You are using BigQuery and your source tables are partitioned tables
  • You are using Snowflake, Databricks or Spark with large tables, and this results in a performance benefit

Examples

Complete example

models/<filename>.yml

version: 2

sources:
- name: jaffle_shop
database: raw

freshness: # default freshness
warn_after: {count: 12, period: hour}
error_after: {count: 24, period: hour}

loaded_at_field: _etl_loaded_at

tables:
- name: customers # this will use the freshness defined above

- name: orders
freshness: # make this a little more strict
warn_after: {count: 6, period: hour}
error_after: {count: 12, period: hour}
# Apply a where clause in the freshness query
filter: datediff('day', _etl_loaded_at, current_timestamp) < 2


- name: product_skus
freshness: # do not check freshness for this table

When running dbt source freshness, the following query will be run:

select
max(_etl_loaded_at) as max_loaded_at,
convert_timezone('UTC', current_timestamp()) as snapshotted_at
from raw.jaffle_shop.orders

where datediff('day', _etl_loaded_at, current_timestamp) < 2

0