Snowflake configurations
Transient tablesβ
Snowflake supports the creation of transient tables. Snowflake does not preserve a history for these tables, which can result in a measurable reduction of your Snowflake storage costs. Transient tables participate in time travel to a limited degree with a retention period of 1 day by default with no fail-safe period. Weigh these tradeoffs when deciding whether or not to configure your dbt models as transient
. By default, all Snowflake tables created by dbt are transient
.
Configuring transient tables in dbt_project.ymlβ
A whole folder (or package) can be configured to be transient (or not) by adding a line to the dbt_project.yml
file. This config works just like all of the model configs defined in dbt_project.yml
.
name: my_project
...
models:
+transient: false
my_project:
...
Configuring transience for a specific modelβ
A specific model can be configured to be transient by setting the transient
model config to true
.
{{ config(materialized='table', transient=true) }}
select * from ...
Query tagsβ
Query tags are a Snowflake parameter that can be quite useful later on when searching in the QUERY_HISTORY view.
dbt supports setting a default query tag for the duration of its Snowflake connections in
your profile. You can set more precise values (and override the default) for subsets of models by setting
a query_tag
model config or by overriding the default set_query_tag
macro:
models:
<resource-path>:
+query_tag: dbt_special
{{ config(
query_tag = 'dbt_special'
) }}
select ...
In this example, you can set up a query tag to be applied to every query with the model's name.
{% macro set_query_tag() -%}
{% set new_query_tag = model.name %}
{% if new_query_tag %}
{% set original_query_tag = get_current_query_tag() %}
{{ log("Setting query_tag to '" ~ new_query_tag ~ "'. Will reset to '" ~ original_query_tag ~ "' after materialization.") }}
{% do run_query("alter session set query_tag = '{}'".format(new_query_tag)) %}
{{ return(original_query_tag)}}
{% endif %}
{{ return(none)}}
{% endmacro %}
Note: query tags are set at the session level. At the start of each model
materialization, if the model has a custom `query_tag` configured, dbt will run `alter session set query_tag` to set the new value. At the end of the materialization, dbt will run another `alter` statement to reset the tag to its default value. As such, build failures midway through a materialization may result in subsequent queries running with an incorrect tag.Merge behavior (incremental models)β
The incremental_strategy
config controls how dbt builds incremental models. By default, dbt will use a merge statement on Snowflake to refresh incremental tables.
Snowflake's merge
statement fails with a "nondeterministic merge" error if the unique_key
specified in your model config is not actually unique. If you encounter this error, you can instruct dbt to use a two-step incremental approach by setting the incremental_strategy
config for your model to delete+insert
.
Configuring table clusteringβ
dbt supports table clustering on Snowflake. To control clustering for a table or incremental model, use the cluster_by
config. When this configuration is applied, dbt will do two things:
- It will implicitly order the table results by the specified
cluster_by
fields - It will add the specified clustering keys to the target table
By using the specified cluster_by
fields to order the table, dbt minimizes the amount of work required by Snowflake's automatic clustering functionality. If an incremental model is configured to use table clustering, then dbt will also order the staged dataset before merging it into the destination table. As such, the dbt-managed table should always be in a mostly clustered state.
Using cluster_byβ
The cluster_by
config accepts either a string, or a list of strings to use as clustering keys. The following example will create a sessions table that is clustered by the session_start
column.
{{
config(
materialized='table',
cluster_by=['session_start']
)
}}
select
session_id,
min(event_time) as session_start,
max(event_time) as session_end,
count(*) as count_pageviews
from {{ source('snowplow', 'event') }}
group by 1
The code above will be compiled to SQL that looks (approximately) like this:
create or replace table my_database.my_schema.my_table as (
select * from (
select
session_id,
min(event_time) as session_start,
max(event_time) as session_end,
count(*) as count_pageviews
from {{ source('snowplow', 'event') }}
group by 1
)
-- this order by is added by dbt in order to create the
-- table in an already-clustered manner.
order by session_start
);
alter table my_database.my_schema.my_table cluster by (session_start);
Automatic clusteringβ
Automatic clustering is enabled by default in Snowflake today, no action is needed to make use of it. Though there is an automatic_clustering
config, it has no effect except for accounts with (deprecated) manual clustering enabled.
If manual clustering is still enabled for your account, you can use the automatic_clustering
config to control whether or not automatic clustering is enabled for dbt models. When automatic_clustering
is set to true
, dbt will run an alter table <table name> resume recluster
query after building the target table.
The automatic_clustering
config can be specified in the dbt_project.yml
file, or in a model config()
block.
models:
+automatic_clustering: true
Configuring virtual warehousesβ
The default warehouse that dbt uses can be configured in your Profile for Snowflake connections. To override the warehouse that is used for specific models (or groups of models), use the snowflake_warehouse
model configuration. This configuration can be used to specify a larger warehouse for certain models in order to control Snowflake costs and project build times.
The following config uses the EXTRA_SMALL
warehouse for all models in the project, except for the models in the clickstream
folder, which are configured to use the EXTRA_LARGE
warehouse. In this example, all Snapshot models are configured to use the EXTRA_LARGE
warehouse.
name: my_project
version: 1.0.0
...
models:
+snowflake_warehouse: "EXTRA_SMALL"
my_project:
clickstream:
+snowflake_warehouse: "EXTRA_LARGE"
snapshots:
+snowflake_warehouse: "EXTRA_LARGE"
Copying grantsβ
When the copy_grants
config is set to true
, dbt will add the copy grants
DDL qualifier when rebuilding tables and views. The default value is false
.
models:
+copy_grants: true
Secure viewsβ
To create a Snowflake secure view, use the secure
config for view models. Secure views can be used to limit access to sensitive data. Note: secure views may incur a performance penalty, so you should only use them if you need them.
The following example configures the models in the sensitive/
folder to be configured as secure views.
name: my_project
version: 1.0.0
models:
my_project:
sensitive:
+materialized: view
+secure: true