dbt - Documentation


profiles.yml defines how dbt connects to your data warehouse(s). By default, dbt expects this file to be located at ~/.dbt/profiles.yml, but this location can be changed on the command line with the --profiles-dir option.

Your profile is specific to you, and contains sensitive information about your warehouse. Never check your profile into your dbt project's git repository, or share your profile with others.

# This configuration file specifies information about connections to
# your data warehouse(s). The file contains a series of "profiles."
# Profiles specify database credentials and connection information
# By default, dbt looks for this file in ~/.dbt/profiles.yml. That option
# can be configured when dbt is invoked with the --profiles-dir option:
#  $ dbt run --profiles-dir /opt/dbt/

# Profiles should adhere to the structure defined below:
# [profile-name]:
#   outputs:
#     [target-name]:
#       type: {redshift, postgres}
#       threads: [1 - 8]
#       host: [host ip or fully qualified domain name]
#       port: [port]
#       user: [user]
#       pass: [password]
#       dbname: [dbname]
#       schema: [schema name]
#   target: [target-name]
# Commonly, it's helpful to define multiple targets for a profile. For example,
# these targets might be `dev` and `prod`. Whereas the `dev` target points to
# a development schema (eg. dbt_dev), the `prod` schema should point to the
# prod schema (eg. analytics). Analytical/BI tools should point to the
# prod schema so that local development does not interfere with analysis.
# In practice, a profile might look like this:

        dev:                      # specify the dev connection
            type: redshift
            threads: 1
            port: 5439
            user: elliot
            pass: pa55word
            dbname: warehouse
            schema: dbt_elliot    # use the dev schema
        prod:                     # specify the prod connection
            type: redshift
            threads: 1
            port: 5439
            user: elliot
            pass: pa55word
            dbname: warehouse
            schema: analytics     # use the prod schema instead
        snowflake:                # specify the snowflake connection
            type: snowflake
            threads: 1
            account: evilcorp     # the url prefix for your snowflake connection,
                                  # i.e. evilcorp.snowflakecomputing.com
            user: elliot
            password: pa55word
            role: SYSADMIN        # optional, the snowflake role you want to use
                                  # when connecting
            database: db
            warehouse: warehouse
            schema: analytics     # use the prod schema instead

    target: dev                   # default target is dev unless changed at run time

        dev:                      # specify the dev connection
            type: postgres
            threads: 1
            port: 5439
            user: mr_robot
            pass: password1
            dbname: warehouse
            schema: dbt_mr_robot  # use the dev schema
        prod:                     # specify the prod connection
            type: postgres
            threads: 1
            port: 5439
            user: mr_robot
            pass: password1
            dbname: warehouse
            schema: analytics     # use the prod schema instead
    target: dev                   # default target is dev unless changed at run time

# You can switch between profiles and targets on the command line. All of the
# following are valid ways to invoke dbt run/test/compile/etc
# $ dbt run --profile evil-corp
# $ dbt run --profile evil-corp --target dev
# $ dbt run --profile evil-corp --target prod
# $ dbt run --profile mr-robot
# $ dbt run --profile mr-robot --target dev
# $ dbt run --profile mr-robot --target prod

When you create a dbt project with dbt init, dbt creates a file in the root of the project called dbt_project.yml. This file contains the configuration for the project. You can see an exhaustive list of the fields available below.

# This configuration file specifies information about your package
# that dbt needs in order to build your models. This file is _required_
# For more information, consult:
# http://dbt.readthedocs.io/en/master/guide/configuring-models/

# Package Configuration

# name: Required. This is the name used to reference your package in the configs
# below. Package names must only contain letters and underscores
name: 'your_package_name'

# version: Required. This indicates the current version of your package and
# should conform to semantic versioning. The field is currently unused
version: '0.0.1'

# File Path Configurations

# The following configs specify directories that dbt uses to operate. All
# paths are specified relative to the path of dbt_project.yml

# source-paths: Required. Specify which path(s) dbt should look in to find
# models. Models are text files with a .sql file extension.
source-paths: ["models"]

# analysis-paths: Optional. Specify which path(s) dbt should look in to find
# analytical queries. These queries are compiled, but not executed, by dbt.
analysis-paths: ["analysis"]

# target-path: Required. Specify which path dbt should write compiled SQL to.
target-path: "target"

# test-paths: Optional. Specify which path(s) dbt should look in to find data
# test definitions.
test-paths: ["test"]

# data-paths: Optional. Specify which path(s) dbt should look in to find CSV
# files. Running `dbt seed` will load these CSVs as tables in your warehouse
data-paths: ["data"]

# macro-paths: Optional. Specify which path(s) dbt should look in to find
# macros. These macros will be globally available to all models in your project
macro-paths: ['macros']

# log-path: Optional. Specify which path dbt should write debug logs to.
log-path: "logs"

# clean-targets: Optional. Specify which path(s) should be removed by the
# clean task. Run `dbt clean` to delete these directories
clean-targets: ["target", "dbt_modules"]

# Connection Configurations

# profile: Required. This config specifies which profile dbt should use to
# connect to your data warehouse. The provided value must exist in the
# profiles.yml file.
profile: "evil-corp"

# Model Configurations

# The following section contains configurations that define how your models are
# instantiated by dbt. The `models` config below can specify options for both
# your package, and any packages included as dependencies.
# Options are specified on a per-package, per-directory, and per-model basis.
# The configs are inherited, so configs applied to a package can be overridden
# for directories and models contained within that package.
# The configuration structure within a package should mirror the directory
# structure within that package. The example configs provided below are based
# on the following directory structure.
# dbt_project.yml
# models/
# ├── adwords
# │   └── adwords_ads.sql
# └── snowplow
#     ├── base
#     │   └── snowplow_events.sql
#     └── snowplow_sessions.sql

    enabled: true        # configs defined here are applied to _all_ packages
    materialized: view   # but can be overridden in more specific configs below

    # pre- and post- hooks can be defined anywhere within the model hierarchy.
    # when defined at the root level (as they are here), they apply to all models
    # in all packages. These hooks are compiled into the model SQL and run either
    # before or after the model is materializated.
        - "insert into audit (model, state, time) values ('{{this.name}}', 'start', getdate())"

        - "grant select on {{this}} to user_1"
        - "insert into audit (model, state, time) values ('{{this.name}}', 'end', getdate())"

    # This is the configuration for the models in your local package. The name
    # `your_package_name` is defined above.
        # Applies to all SQL files found under ./models/adwords/
            enabled: false
            # Applies to the specific model ./models/adwords/adwords_ads.sql
                enabled: true
                materialized: table

        # Applies to all SQL files found under ./models/snowplow/
            # Applies to all SQL files found under ./models/snowplow/base
                # Applies to model ./models/snowplow/base/snowplow_events.sql
                    materialized: table
                    sort: ['timestamp', 'userid']
                    sort_type: interleaved
                    dist: 'userid'

            # Applies to model ./models/snowplow/snowplow_sessions.sql
                materialized: incremental
                sql_where: "created_at > (select max(created_at) from {{ this }})"
                unique_key: "id"
                sort: "timestamp"
                dist: "user_id"

    # This is the configuration for the "quickbooks" open source package
    # which is included in the `repositories` section below.
    # These configs override those defined within the quickbooks package
            materialized: ephemeral
            materialized: view

# Run Start & Complete Hooks

# Like the pre- and post- hooks above, the on-run-start and on-run-end configs
# make it possible to run SQL at the very beginning, and very end of a dbt run.
    - "create table if not exists audit (model text, state text, time timestamp)"

    - 'grant usage on schema "{{ target.schema }}" to db_reader'
    - 'grant select on all tables in schema "{{ target.schema }}" to db_reader'

# Package configurations

# repositories: Optional. Contains a list of packages (git repositories) that
# should be downloaded and included in this project. When a package is
# included, its models are added to the dbt model graph and executed with
# `dbt run`. These configs can be specified above (as with quickbooks).
# Each repository should be a git-cloneable url. Branches and Tags can be
# specified by adding @branch-or-version to the end of the url, ex:
# - https://github.com/fishtown-analytics/quickbooks
# - https://github.com/fishtown-analytics/quickbooks@v0.1.0
# - https://github.com/fishtown-analytics/quickbooks@master

  - https://github.com/fishtown-analytics/quickbooks

# Archival

# Data archival is a powerful feature intended for advanced dbt users. For more
# information, consult: https://dbt.readme.io/docs/archival

dbt run executes compiled sql model files against the current target database. dbt connects to the target database and runs the relevant SQL required to materialize all data models using the specified materialization strategies. Models are run in the order defined by the dependency graph generated during compilation. Intelligent multi-threading is used to minimize execution time without violating dependencies.

Deploying new models frequently involves destroying prior versions of these models. In these cases, dbt run minimizes the amount of time in which a model is unavailable by first building each model with a temporary name, then dropping the existing model, then renaming the model to its correct name. The drop and rename happen within a single database transaction for database adapters that support transactions.

Specifying models to run

By default, dbt run will execute all of the models in the dependency graph. During development (and deployment), it is useful to specify only a subset of models to run. Use the --models flag with dbt run to select a subset of models to run. Note that the following arguments (--models and --exclude) also apply to dbt test!

The --models flag accepts one or more arguments. Each argument can be one of:

  1. a package name
  2. a model name
  3. a path hierarchy to a models directory


dbt run --models my_dbt_project_name   # runs all models in your project
dbt run --models my_dbt_model          # runs a specific model
dbt run --models path.to.my.models     # runs all models in a specific directory
dbt run --models my_package.some_model # run a specific model in a specific package

# multiple arguments can be provided to --models
dbt run --models my_project other_project

# these arguments can be projects, models, or directory paths
dbt run --models my_project path.to.models a_specific_model

Additionally, arguments to --models can be modified with the + and * operators. If placed at the front of the model specifier, + will select all parents of the selected model(s). If placed at the end, + will select all children of the selected models. The * operator matches all models within a package or directory.

dbt run --models my_model+          # select my_model and all children
dbt run --models +my_model          # select my_model and all parents

dbt run --models my_package.*       # select all models in my_package
dbt run --models path.to.models.*   # select all models in path/to/models

dbt run --models my_package.*+      # select all models in my_package and their children
dbt run --models +some_model+       # select some_model and all parents and children

Finally, dbt provides an --exclude flag with the same semantics as --models. Models specified with the --exclude flag will be removed from the set of models selected with --models

dbt run --models my_package.*+ --exclude my_package.a_big_model+

Run dbt non-destructively

If you provide the --non-destructive argument to dbt run, dbt will minimize the amount of destructive changes it runs against your database. Specifically, dbt

  1. Ignore models materialized as views
  2. Truncate tables and re-insert data instead of dropping and re-creating these tables

This flag is useful for recurring jobs which only need to update table models and incremental models. DBT will not create, drop, or modify views if the --non-destructive flag is provided.

dbt run --non-destructive

Refresh incremental models

If you provide the --full-refresh argument to dbt run, dbt will treat incremental models as table models. This is useful when

  1. The schema of an incremental model changes and you need to recreate it
  2. You want to reprocess the entirety of the incremental model because of new logic in the model code
dbt run --full-refresh

In model code, this flag will be reflected as flags.FULL_REFRESH, for example:

select * from all_events

-- if the table already exists and `--full-refresh` is
-- not set, then only add new records. otherwise, select
-- all records.
{% if already_exists(this.schema, this.name) 
      and not flags.FULL_REFRESH %}
   where collector_tstamp > (
     select coalesce(max(max_tstamp), '0001-01-01') from {{ this }}
{% endif %}

dbt test runs tests on data in deployed models. There are two types of tests:

  • schema validations, declared in a schema.yml file.
  • custom data tests, written as SQL SELECT statements.

dbt test runs both types of test and reports the results to the console.

The tests to run can be selected using the --models flag discussed here.

# run tests for one_specific_model
dbt test --models one_specific_model

# run tests for all models in package
dbt test --models some_package.*

# run only custom data tests
dbt test --data

# run only schema tests
dbt test --schema

# run custom data tests for one_specific_model
dbt test --data --models one_specific_model

# run schema tests for one_specific_model
dbt test --schema --models one_specific_model

For more information on writing tests, see the Testing Documentation.

dbt deps pulls the most recent version of the dependencies listed in your packages.yml from git. See Package-Management for more information.


dbt --version is a utility function to check the version of your installed dbt client.


Commonly, analysts need to "look back in time" at some previous state of data in their mutable tables. While some systems are built in a way that makes accessing historical data possible, this is often not the case. dbt provides a mechanism, dbt archive, to record the changes made to a mutable table over time.

To use dbt archive, declare the tables you want to archive in your dbt_project.yml file:

  - source_schema: production_data         # schema to look for tables in
    target_schema: dbt_archive             # where to archive the data to
    tables:                                # list of tables to archive
      - source_table: users                # table to archive (production_data.users)
        target_table: users_archived       # table to insert archived data into
        updated_at: updated_at             # used to determine when data has changed
        unique_key: id                     # used to generate archival query
      - source_table: other_table          # production_data.some_other_table
        target_table: other_table_archive # dbt_archive.some_other_table_archive
        updated_at: "updatedAt"
        unique_key: "expressions || work || LOWER(too)"
  - source_schema: some_other_schema

The archived tables will mirror the schema of the source tables they are generated from. In addition, three fields are added to the archive table:

  1. valid_from: The timestamp when this archived row was inserted (and first considered valid)
  2. valid_to: The timestamp when this archived row became invalidated. The first archived record for a given unique_key has valid_to = NULL. When newer data is archived for that unique_key, the valid_to field of the old record is set to the valid_from field of the new record.
  3. scd_id: A unique key generated for each archive record. scd = Slowly Changing Dimension.

dbt models can be built on top of these archived tables. The most recent record for a given unique_key is the one where valid_to is null.

To run this archive process, use the command dbt archive. After testing and confirming that the archival works, you should schedule this process to run on a recurring basis.

dbt init [project_name] performs several actions necessary to create a new dbt project:

  • creates a ~/.dbt/profiles.yml file if one does not already exist
  • creates a new folder called [project_name]
  • generates directories and sample files necessary to get started with dbt

dbt clean is a utility function that deletes all folders specified in the clean-targets list specified in dbt_project.yml. This is useful for deleting the dbt_modules and target directories.

dbt debug is a utility function to show debug information.

The --config-dir option to dbt debug will show you where your .dbt configuration directory is located:

$ dbt debug --config-dir
To view your profiles.yml file, run:

open /Users/alice/.dbt

dbt seed loads data from csv files into your data warehouse. Because these csv files are located in your dbt repository, they are version controlled and code reviewable. Thus, dbt seed is appropriate for loading static data which changes infrequently.

The dbt seed command will load csv files located in the data-paths directory of your dbt project into your data warehouse. You can configure the data-paths directory by adding the following line to your dbt_project.yml file:

data-paths: ["data"] # default is './data'

Assuming you have a csv file that looks like this:

US,United States
GB,United Kingdom

Running dbt seed with the above csv located at data/country_codes.csv will create a table in your data warehouse with two columns: country_code and country_name.

To see a sample of the data loaded by dbt, use the --show argument to dbt seed:

$ dbt seed --show

Found 2 models, 3 tests, 0 archives, 0 analyses, 53 macros, 0 operations, 1 seed file

14:46:15 | Concurrency: 1 threads (target='dev')
14:46:15 |
14:46:15 | 1 of 1 START seed file analytics.country_codes........................... [RUN]
14:46:15 | 1 of 1 OK loaded seed file analytics.country_codes....................... [INSERT 3 in 0.01s]
14:46:16 |
14:46:16 | Finished running 1 seed in 0.14s.

Random sample of table: analytics.country_codes
| country_code | country_name   |
| ------------ | -------------- |
| GB           | United Kingdom |
| CA           | Canada         |
| US           | United States  |

Completed successfully


In addition to the standard --profile and --target arguments, dbt seed also accepts the --full-refresh argument. If provided, dbt will drop and re-create the specified table instead of truncating and inserting new data. This is useful if the schema of the csv file changes in a way which is incompatible with the existing table.

As of dbt 0.9.2, seed files can be used with the ref() function in models. In practice, this looks like:

-- This refers to the table created from data/country_codes.csv
select * from {{ ref('country_codes') }}

Configuring seeds

Basic configuration

Seed files can be configured using the same semantics as models. With these configurations, you can selectively enable or disabled seed files, or configure them to materialize in a custom schema. To configure seed files, use the seeds: option in your dbt_project.yml file.

data-paths: ["data"]


    materialized: view
    enabled: true
    schema: seed_data
    post-hook: "grant select on {{ this }} to bi_user"

Override column types

New in 0.10.1

In addition to configuring schemas, the column types of a seed file can also be configured in the dbt_project.yml file. If this configuration is not provided, dbt will try to infer column types from a sample of the seed data. Note that if you change these types, you'll need to run dbt seed in --full-refresh mode to update the table schema. Example usage:

data-paths: ["data"]


    materialized: view
    enabled: true
    schema: seed_data
    # This configures data/country_codes.csv
      # Override column types
        country_code: varchar(2)
        country_name: varchar(32)

dbt compile generates executable SQL from source model, test, and analysis files. You can find these compiled SQL files in the target/ directory of your dbt project.

The compile command is useful for:

  1. Visually inspecting the compiled output of model files. This is useful for validating complex jinja logic or macro usage.
  2. Manually running compiled SQL. While debugging a model or schema test, it's often useful to execute the underlying select statement to find the source of the bug.
  3. Compiling analysisfiles. Read more about analysis files here.

adapter is a wrapper around the internal database adapter used by dbt. It allows users to make calls to the database in their dbt models.

The following functions are available on adapter:

class DatabaseWrapper(object):
    Wrapper for runtime database interaction. Should only call adapter

    def already_exists(self, schema, table):
        Returns true if a relation named like `table` exists in schema
        `schema`, false otherwise.

    def get_columns_in_table(self, schema_name, table_name):
        Returns a list of Columns in a table. Useful for writing 
        `INSERT ... SELECT` queries.

    def get_missing_columns(self, from_schema, from_table,
                            to_schema, to_table):
        Returns the set of Columns that is the difference of the 
        columns in the `from_table` and the columns in the `to_table`,
        i.e. (`set(from_table.columns) - set(to_table.columns)`). 
        Useful for detecting new columns in a source table.
Suggest Edits




  • schema: The schema to test
  • table: The relation to look for

Returns true if a relation named like table exists in schema schema, false otherwise.

(Source on Github)

select * from {{ref('raw_table')}}

{% if adapter.already_exists(this.schema, this.name) %}
  where id > (select max(id) from {{this}})
{% endif %}
Suggest Edits




  • schema_name: The schema to test
  • table_name: The table (or view) from which to select columns

Returns a list of Columns in a table.

{% set dest_columns = adapter.get_columns_in_table(schema, identifier) %}
{% set dest_cols_csv = dest_columns | map(attribute='quoted') | join(', ') %}

insert into {{ this }} ({{ dest_cols_csv }}) (
  select {{ dest_cols_csv }}
  from {{ref('another_table')}}
Suggest Edits




  • from_schema: The schema for the from_table
  • from_table: The from_table to check for differences
  • to_schema: The schema for the to_table
  • to_table: The to_table to check for differences

Returns the set of Columns that is the difference of the columns in the from_table
and the columns in the to_table, i.e. (set(from_table.columns) - set(to_table.columns)).
Useful for detecting new columns in a source table.

(Source on Github)

{% for col in adapter.get_missing_columns(this.schema, 'source_table', this.schema, this.name) %}
  alter table {{this}} add column "{{col.name}}" {{col.data_type}};
{% endfor %}
Suggest Edits




  • schemas: A string, or list of strings, of schemas to query for existing views and tables

Returns the dictionary that maps table names to their relation type. Relation types can either be view or table.

(Source on Github)

{% for (relation, type) in adapter.query_for_existing('snowplow').items() %}
  {% if type == 'table' %}
    {{ vacuum(adapter.quote_schema_and_table('snowplow', relation)) }}
  {% endif %}
{% endfor %}
Suggest Edits



invocation_id outputs a UUID generated for this dbt run (useful for auditing)

Suggest Edits



run_started_at outputs the timestamp that this run started, e.g. 2017-04-21 01:23:45.678.

The run_started_at variable is a Python datetime object. As of 0.9.1, the timezone of this variable
defaults to UTC.

	'{{ run_started_at.strftime("%Y-%m-%d") }}' as date_day
from ...

To modify the timezone of this variable, use the the pytz module:

	'{{ run_started_at.astimezone(modules.pytz.timezone("America/New_York"))' as run_started_est
from ...

this makes available schema information about the currently executing model. It's is useful in any context in which you need to write code that references the current model, for example when defining a sql_where clause for an incremental model and for writing pre- and post-model hooks that operate on the model in some way. Developers have options for how to use this:

DBT Model Syntax






table or table__dbt_tmp, depending on whether or not dbt created a temporary table for this model.



Here's an example of how to use this in dbt_project.yml to write a sql_where clause that only inserts the new records since the most recent timestamp found in the table:

  materialized: incremental
  sql_where: "session_end_tstamp > (select max(session_end_tstamp) from {{this}})"

Here's an example of how to use this in dbt_project.yml to grant select rights on a table to a different db user.

      - "grant select on {{ this }} to db_reader"

target makes available information about which database and schema this dbt run is executing against.

DBT Model Syntax


Name of the active target, i.e. target

{{target.dbname}} (Postgres/Redshift)
{{target.database}} (Snowflake)

Database name specified in active target. This varies by database type.


Host specified in active target


Schema specified in active target


Database type (postgres / redshift / snowflake / ...) specified in active target


User specified in active target


Port specified in active target


Number of threads specified in active target

The most important function in dbt is ref(); it's impossible to build even moderately complex models without it. ref() is how you reference one model within another. This is a very common behavior, as typically models are built to be "stacked" on top of one another. Here is how this looks in practice:

select *
from public.raw_data
select *
from {{ref('model_a')}}

ref() is, under the hood, actually doing two important things. First, it is interpolating the schema into your model file to allow you to change your deployment schema via configuration. Second, it is using these references between models to automatically build the dependency graph. This will enable dbt to deploy models in the correct order when using dbt run.

New in 0.9.0

The {{ ref }} function returns a Relation object that has the same table, schema, and name attributes at the {{ this }} variable.

Advanced ref usage

There is also a two-argument variant of the ref function. With this variant, you can pass both a package name and model name to ref to avoid ambiguity. This functionality is not commonly required for typical dbt usage.

select * from {{ ref('package_name', 'model_name') }}

Variables can be passed from your dbt_project.yml file into models during compilation.
These variables are useful for configuring packages for deployment in multiple environments, or defining values that should be used across multiple models within a package.

To add a variable to a model, use the var() function:

select * from events where event_type = '{{ var("event_type") }}'

If you try to run this model without supplying an event_type variable, you'll receive
a compilation error that looks like this:

Encountered an error:
! Compilation error while compiling model package_name.my_model:
! Required var 'event_type' not found in config:
Vars supplied to package_name.my_model = {

To supply a variable to a given model, add one or more vars dictionaries to the models
config in your dbt_project.yml file. These vars are in-scope for all models at or below
where they are defined, so place them where they make the most sense. Below are three different
placements of the vars dict, all of which will make the my_model model compile.

# 1) scoped at the model level
      materialized: view
        event_type: activation

# 2) scoped at the package level
      event_type: activation
      materialized: view

# 3) scoped globally
    event_type: activation
      materialized: view

Variable default values

The var() function takes an optional second argument, default. If this argument is provided, then it will be the default value for the variable if one is not explicitly defined.

-- Use 'activation' as the event_type if the variable is not defined.
select * from events where event_type = '{{ var("event_type", "activation") }}'

The env_var function can be used to incorporate Environment Variables from the system into your dbt project. This env_var function can be used 1) in the profiles.yml file or 2) in model sql.

When used in a profiles.yml file (to avoid putting credentials on a server), it can be used like this:

  target: prod
      type: postgres
      # IMPORTANT: Make sure to quote the entire Jinja string here
      user: "{{ env_var('DBT_USER') }}"
      pass: "{{ env_var('DBT_PASSWORD') }}"

If the DBT_USER and DBT_PASSWORD environment variables are present when dbt is invoked, then these variables will be pulled into the profile as expected. If any environment variables are not set, then dbt will raise a compilation error.

Quoting, Curly Brackets, & You

Be sure to quote the entire jinja string (as shown above), or else the yaml parser will be confused by the Jinja curly brackets.


  • msg: The message (string) to log
  • info: If False, write to the log file. If True, write to both the log file and stdout (default=False)

Logs a line to either the log file or stdout.

(Source on Github)

{% macro some_macro(arg1, arg2) %}

	{{ log("Running some_macro: " ~ arg1 ~ ", " ~ arg2) }}

{% endmacro %}

The schema that the model is configured to be materialized in. This is typically the same as model['schema'].

Suggest Edits



Use this variable to throw an exception. This is typically only useful in macros or materializations when invalid arguments are provided by the calling model. The exceptions variable points to exceptions module in dbt.

Example usage:

{% if number < 0 or number > 100 %}
  {{ exceptions.raise_compiler_error("Invalid `number`. Got: " ~ number) }}
{% endif %}

Note that throwing an exception will cause a model to fail, so please use this variable with care!

The flags variable points to the flags module in dbt. Use this variable to determine properties of the active dbt run.

Example usage:

{% if flags.NON_DESTRUCTIVE %}
-- no-op
{% else %}
drop table ...
{% endif %}

The list of valid flags are:



  • data: The data to return to the caller

The return function can be used in macros to return data to the caller. The type of the data (dict, list, int, etc) will be preserved through the return call.

(Source on Github)

{% macro get_data() %}

  {{ return([1,2,3]) }}
{% endmacro %}

  -- getdata() returns a list!
  {% for i in getdata() %}
    {{ i }}
    {% if not loop.last %},{% endif %}
  {% endfor %}

The api variable in the Jinja context contains useful objects and methods for building dbt models and macros.


New in 0.10.1

The Relation object is used to interpolate schema and table names into SQL code with appropriate quoting. This object should always be used instead of interpolating values with {{ schema }}.{{ table }} directly. Quoting of the Relation object can be configured in the Configuring Quoting section of the documentation.

Creating Relations

A Relation can be created in two ways: from a node, or from scratch.

1. Creating a Relation from a node:

class Relation:
  def create_from_node(node, table_name=None):
    node: The dbt node to create a relation for. In the context of a model,
          this  will be the `model` context variable
    table_name (optional): An optional override for the "identifier" part of
          the Relation. If not provided, the Relation will use `node['name']`.
          This is useful when the name of a table differs from the name of
          the model, as with `__dbt_tmp` suffixes. You likely won't need to
          use this argument outside of building custom materializations.

2. Creating a Relation from scratch:

class Relation:
  def create(database=None, schema=None, identifier=None,
             table_name=None, type=None):
    database (optional): The name of the database for this relation
    schema (optional): The name of the schema (or dataset, if on BigQuery) for this relation
    identifier (optional): The name of the identifier for this relation
    table_name (optional): An optional override for the "table" part of
          the Relation. If not provided, the Relation will use `node['name']`.
          This is useful when the name of a table differs from the name of
          the model, as with `__dbt_tmp` suffixes. You likely won't need to
          use this argument outside of building custom materializations.
    type (optional): Metadata about this relation, eg: "table", "view", "cte"

Using Relations

{% set relation = api.Relation.create(schema='snowplow', identifier='events') %}

-- Return the `database` for this relation
{{ relation.database }}

-- Return the `schema` (or dataset) for this relation
{{ relation.schema }}

-- Return the `identifier` for this relation
{{ relation.identifier }}

-- Return the `table_name` for this relation (Deprecated, exists for backwards compatibility reasons)
{{ relation.table }}

-- Return true if the relation is a table
{{ relation.is_table }}

-- Return true if the relation is a view
{{ relation.is_view }}

-- Return true if the relation is a cte
{{ relation.is_cte }}

The Column object is used to encode information about a column in a relation.

class Column(object):
  def __init__(self, column, dtype, char_size=None, numeric_size=None):
      column: The name of the column represented by this object
      dtype: The data type of the column (database-specific)
      char_size: If dtype is a variable width character type, the size of the column, or else None
      numeric_size: If dtype is a fixed precision numeric type, the size of the column, or else None
# Example Usage:
col = Column('name', 'varchar', 255)
col.is_string() # True
col.is_numeric() # False
col.string_type() # character varying (255)

Column API


name: Returns the name of the column
quoted: Returns the name of the column wrapped in quotes
data_type: Returns the data type of the column

Instance methods:

is_string(): Returns True if the column is a String type (eg. text, varchar), else False
is_numeric(): Returns True if the column is a Numeric type (eg. numeric), else False
string_size(): Returns the width of the column if it is a string type, else, an exception is raised

Static methods:

string_type(size): Returns a database-useable representation of the string type (eg. character varying(255))
numeric_type(dtype, size): Returns a database-useable representation of the numeric type (eg. numeric(12, 4))

BigQuery Columns

The Column type is overridden as a BigQueryColumn in BigQuery dbt projects. This object works the same as the Column type described above, with the exception of extra properties and methods:


fields: Returns the list of subfields contained within a field (if the column is a STRUCT)
mode: Returns the "mode" of the column, eg. REPEATED

Instance methods

flatten(): Return a flattened list of BigQueryColumns in which subfields are expanded into their own columns. For example, this nested field:

[{"hits": {"pageviews": 1, "bounces": 0}}]

will be expanded to:

[{"hits.pageviews": 1, "hits.bounces": 0}]
Suggest Edits

statement blocks


statements are sql queries that hit the database and return results to your Jinja context. Here’s an example of a statement which gets all of the states from a users table.

{%- call statement('states', fetch_result=True) -%}

    select distinct state from {{ ref('users') }}

{%- endcall -%}

The signature of the statement block looks like this:

statement(name=None, fetch_result=False, auto_begin=True)


  • name (string): The name for the result set returned by this statement
    • fetch_result (bool): If True, load the results of the statement into the Jinja context
    • auto_begin (bool): If True, open a transaction if one does not exist. If false, do not open a transaction.

Once the statement block has executed, the result set is accessible via the load_results function. For the above statement, that would look like:

{%- set states = load_result('states') -%}
{%- set states_data = states['data'] -%}
{%- set states_status = states['status'] -%}

The contents of the returned data field is a matrix. It contains a list rows, with each row being a list of values returned by the database. For the above example, this data structure might look like:

>>> log(states_data)


Volatile API

While the statement and load_result setup works for now, we intend to improve this interface in the future. If you have questions or suggestions, please let us know in GitHub or on Slack.

The config variable exists to handle end-user configuration for custom materializations. Configs like sql_where and unique_key can be implemented using the config variable in your own materializations.

For example, code in the incremental materialization like this:

{% materialization incremental, default -%}
  {%- set sql_where = config.require('sql_where') -%}
  {%- set unique_key = config.get('unique_key') -%}

is responsible for handling model code that looks like this:

    sql_where='updated_at > (select max(updated_at) from {{ this }})',
Suggest Edits




  • name: The name of the configuration variable (required)
  • default: The default value to use if this configuration is not provided (optional)

The config.get function is used to get configurations for a model from the end-user. Configs defined in this way are optional, and a default value can be provided.

Example usage:

{% materialization incremental, default -%}
  -- Example w/ no default. unique_key will be None if the user does not provide this configuration
  {%- set unique_key = config.get('unique_key') -%}

  -- Example w/ default value. Default to 'id' if 'unique_key' not provided
  {%- set unique_key = config.get('unique_key', default='id') -%}

(Source on Github)

Suggest Edits




  • name: The name of the configuration variable (required)

The config.require function is used to get configurations for a model from the end-user. Configs defined using this function are required, and failure to provide them will result in a compilation error.

Example usage:

{% materialization incremental, default -%}
  {%- set sql_where = config.require('sql_where') -%}

(Source on Github)

Suggest Edits

The dbt API


While dbt is typically invoked from the command line, it is possible to import and invoke dbt as a Python module. This API is presently undocumented and is liable to change in the future. Please contact us if you choose to run dbt in this manner!

Suggest Edits

Exit Codes


When dbt exits, it will return an exit code of either 0, 1, or 2.

Exit Code


The dbt invocation completed without error.


The dbt invocation completed with at least one handled error (eg. model syntax error, bad permissions, etc). The run was completed, but some models may have been skipped.


The dbt invocation completed with an unhandled error (eg. ctrl-c, network interruption, etc).

While these exit codes may change in the future, a zero exit code will always imply success whereas a nonzero exit code will always imply failure.