dbt - Documentation

Testing

Validate your models with dbt

dbt models are only useful if they're correct. It's typical to spend some time after building or updating a model validating that it works how you expect. This process might also involve testing other models, ensuring that you didn't accidentally make a change that has deleterious effects on downstream models. This process is both time consuming and error prone, and it's easy to get lazy with manual testing.

To this end, dbt makes it easy to define automated tests on your models. After writing these tests once, you can quickly run them when developing locally, or in production on an ongoing basis. By testing your models, you can build confidence in their correctness and avoid costly regressions down the line.

Tests are gifts you send to your future self

Testing is an essential part of a mature analytics workflow. It only takes a minute to set up a test -- your future self will thank you!

dbt provides two different mechanisms for data validation: schema tests and custom data tests.

Schema Tests

New in dbt v0.11.0

This document describes version 2 of the schema.yml syntax, added in dbt v0.11.0. For a reference of the version 1 schema.yml syntax, check out the dbt v0.10.0 docs

Schema tests are assertions that a model's schema adheres to basic rules: referential integrity, uniqueness, and nullity, for instance. Modern data warehouses don't enforce these rules, but they're still incredibly useful tools for reasoning about and validating data in a database.

Running Tests

This guide explains the different types of tests built into dbt and demonstrates how to use them. For information on running tests like these, check out the dbt test command reference.

Writing Schema Tests

Schema tests are declared in a schema.yml file. dbt provides four types of schema tests out of the box. Custom schema tests can also be created.

Not null

This test validates that there are no null values present in a column.

version: 2

models:
  - name: people
    columns:
      - name: id
        tests:
          - not_null
              
      - name: account_id
        tests:
          - not_null
              
      - name: status
        tests:
          - not_null

Unique

This test validates that there are no duplicate values present in a field.

version: 2

models:
  - name: people
    columns:
      - name: id
        tests:
          - unique

Relationships

This test validates that all of the records in a child table have a corresponding record in a parent table. This property is referred to as "referential integrity".

The following example tests that every person's account_id maps back to a valid account. Note that the ref function is used here to reference another dbt model.

version: 2

models:
  - name: people
    columns:
      - name: account_id
        tests:
          - relationships:
              to: ref('accounts')
              field: id

Accepted values

This test validates that all of the values in a column are present in a supplied list of values. If any values other than those provided in the list are present, then the test will fail.

version: 2

models:
  - name: people
    columns:
      - name: status
        tests:
          - accepted_values:
              values: ['active', 'cancelled']

Putting it all together

The tests above were shown in separate files for clarity, but in practice, they should all be combined into a single file. This combined schema.yml file would look like this:

version: 2

models:
  - name: people
    columns:
      - name: id
        tests:
          - not_null
          - unique
              
      - name: account_id
        tests:
          - not_null
          
      - name: account_id
        tests:
          - relationships:
              to: ref('accounts')
              field: id
            
      - name: status
        tests:
          - not_null
          - accepted_values:
              values: ['active', 'cancelled']

Testing Expressions

Sometimes, a constraint only holds for a combination of columns. For example, maybe the combination of user_id and date is unique, but neither column is unique in isolation.

To test expressions like this, specify the tests directive at the model level, and supply your expression as an argument. For simple tests like unique and not_null, you can just pass the expression as a string. For accepted_values and relationships tests, you should supply an option named field as shown below.

version: 2

models:
  - name: people
    tests:
      - unique:
          column_name: "concat(date_day, user_id)"
      
      - not_null:
          column_name: "coalesce(status, new_status)"
      
      - accepted_values:
          field: "coalesce(status, new_status)"
          values: ['active', 'cancelled']
        
      - relationships:
          field: "coalesce(account_id, parent_account_id)"
          to: ref('accounts')
          field: id

It is recommended that users specify tests for as many constraints as can be reasonably identified in their database. This may result in a large number of total tests, but schema.yml makes it fast to create and modify these tests. The presence of tests like these can significantly increase the confidence in underlying data consistency in a database.

Custom data tests

Not all error conditions can be expressed in a schema test. For this reason, dbt provides a mechanism for testing arbitrary assertions about your data. These data tests are sql SELECT statements that return 0 rows on success, or more than 0 rows on failure.

Example custom data test

A typical data test might look like:

-- If >= 5% of cookie_ids are null, then the test returns 1 row (failure).
-- If < 5% of cookie_ids are null, then the test returns 0 rows (success)

with calc as (

    select
        case
            when cookie_id is null then 1
            else 0
        end as cookie_is_null
  
    from {{ ref('events') }}
  
),

agg as (
  
    select
        sum(cookie_is_null)::float / nullif(count(*), 0) as pct_null

    from calc

)

select *
from agg
where pct_null < 0.05

Enabling custom data tests

To enable data tests, add the test-paths config to your dbt_project.yml file:

test-paths: ["tests"]        # look for *.sql files in the "tests" directory

Running tests

Schema tests and custom data tests can be run with the dbt test command as shown below.

dbt test           # run schema + data tests
dbt test --schema  # run only schema tests
dbt test --data    # run only data tests

A full description of the options for dbt test can be found in the command reference section.