freshness

models/<filename>.yml
version: 2
sources:
- name: <source_name>
freshness:
warn_after:
count: <positive_integer>
period: minute | hour | day
error_after:
count: <positive_integer>
period: minute | hour | day
filter: <boolean_sql_expression>
loaded_at_field: <column_name_or_expression>
tables:
- name: <table_name>
freshness:
warn_after:
count: <positive_integer>
period: minute | hour | day
error_after:
count: <positive_integer>
period: minute | hour | day
filter: <boolean_sql_expression>
loaded_at_field: <column_name_or_expression>
...

Definition

A freshness block is used to define the acceptable amount of time between the most recent record, and now, for a table to be considered "fresh".

In the freshness block, one or both of warn_after and error_after can be provided. If neither is provided, then dbt will not calculate freshness snapshots for the tables in this source.

Additionally, the loaded_at_field is required to calculate freshness for a table. If a loaded_at_field is not provided, then dbt will not calculate freshness for the table.

Freshness blocks are applied hierarchically:

  • a freshness and loaded_at_field property added to a source will be applied to all all tables defined in that source
  • a freshness and loaded_at_field property added to a source table will override any properties applied to the source.

This is useful when all of the tables in a source have the same loaded_at_field, as is often the case.

loaded_at_field

(Required)

A column name (or expression) that returns a timestamp indicating freshness.

If using a date field, you may have to cast it to a timestamp:

loaded_at_field: "completed_date::timestamp"

If using a non-UTC timestamp, cast it to UTC first:

loaded_at_field: "convert_timezone('UTC', 'Australia/Sydney', created_at_local)"

count

(Required)

A positive integer for the number of periods where a data source is still considered "fresh".

period

(Required)

The time period used in the freshness calculation. One of minute, hour or day

filter

(optional)

Add a where clause to the query run by dbt source snapshot-freshness in order to limit data scanned.

This filter only applies to dbt's source freshness queries - it will not impact other uses of the source table.

This is particularly useful if:

  • You are using BigQuery and your source tables are partitioned tables
  • You are using Snowflake or Spark with large tables, and this results in a performance benefit
Changelog

Examples

Complete example

models/<filename>.yml
version: 2
sources:
- name: jaffle_shop
database: raw
freshness: # default freshness
warn_after: {count: 12, period: hour}
error_after: {count: 24, period: hour}
loaded_at_field: _etl_loaded_at
tables:
- name: customers # this will use the freshness defined above
- name: orders
freshness: # make this a little more strict
warn_after: {count: 6, period: hour}
error_after: {count: 12, period: hour}
# Apply a where clause in the freshness query
filter: datediff('day', _etl_loaded_at, current_timestamp) < 2
- name: product_skus
freshness: {} # do not check freshness for this table

When running dbt source snapshot-freshness, the following query will be run:

select
max(_etl_loaded_at) as max_loaded_at,
convert_timezone('UTC', current_timestamp()) as snapshotted_at
from raw.jaffle_shop.orders
where datediff('day', _etl_loaded_at, current_timestamp) < 2