Snowflake configurations

Iceberg table format

The dbt-snowflake adapter supports the Iceberg table format. It is available for three of the Snowflake materializations:

For now, to create Iceberg tables, you must implement a behavior flag due to performance impact related to using Iceberg tables. Snowflake does not support is_iceberg on the Show Objects query, which dbt depends on for metadata.

To use Iceberg, set the enable_iceberg_materializations flag to True in your dbt_project.yml:

dbt_project.yml

flags:
  enable_iceberg_materializations: True

The following configurations are supported. For more information, check out the Snowflake reference for CREATE ICEBERG TABLE (Snowflake as the catalog).

Parameter	Type	Required	Description	Sample input	Note
`table_format`	String	Yes	Configures the objects table format.	`iceberg`	`iceberg` is the only accepted value.
`external_volume`	String	Yes(*)	Specifies the identifier (name) of the external volume where Snowflake writes the Iceberg table's metadata and data files.	`my_s3_bucket`	*You don't need to specify this if the account, database, or schema already has an associated external volume. More info
`base_location_root`	String	No	If provided, the input will override the default dbt base_location value of `_dbt`
`base_location_subpath`	String	No	An optional suffix to add to the `base_location` path that dbt automatically specifies.	`jaffle_marketing_folder`	We recommend that you do not specify this. Modifying this parameter results in a new Iceberg table. See Base Location for more info.

Example configuration

To configure an Iceberg table materialization in dbt, refer to the example configuration:

models/<modelname>.sql

{{
  config(
    materialized = "table",
    table_format="iceberg",
    external_volume="s3_iceberg_snow",
  )
}}

select * from {{ ref('raw_orders') }}

Base location

Snowflake's CREATE ICEBERG TABLE DDL requires that a base_location be provided. dbt defines this parameter on the user's behalf to streamline usage and enforce basic isolation of table data within the EXTERNAL VOLUME. The default behavior in dbt is to provide a base_location string of the form: _dbt/{SCHEMA_NAME}/{MODEL_NAME}.

We recommend using the default behavior, but if you need to customize the resulting base_location, dbt allows users to configure the base_location with the base_location_root and base_location_subpath.

If no inputs are provided, dbt will output for base_location {{ external_volume }}/_dbt/{{ schema }}/{{ model_name }}
If base_location_root = foo, dbt will output {{ external_volume }}/foo/{{ schema }}/{{ model_name }}
If base_location_subpath = bar, dbt will output {{ external_volume }}/_dbt/{{ schema }}/{{ model_name }}/bar
If base_location = foo and base_location_subpath = bar, dbt will output {{ external_volume }}/foo/{{ schema }}/{{ model_name }}/bar

A theoretical (but not recommended) use case is re-using an EXTERNAL VOLUME while maintaining isolation across development and production environments. We recommend against this as storage permissions should configured on the external volume and underlying storage, not paths that any analytics engineer can modify.

Rationale

By default, dbt manages base_location on behalf of users to enforce best practices. With Snowflake-managed Iceberg format tables, the user owns and maintains the data storage of the tables in an external storage solution (the declared external volume). The base_ location parameter declares where to write the data within the external volume. The Snowflake Iceberg catalog keeps track of your Iceberg table regardless of where the data lives within the external volume declared and the base_location provided. However, Snowflake permits passing anything into the base_location field, including an empty string, even reusing the same path across multiple tables. This behavior could result in future technical debt because it will limit the ability to:

Navigate the underlying object store (S3/Azure blob)
Read Iceberg tables via an object-store integration
Grant schema-specific access to tables via object store
Use a crawler pointed at the tables within the external volume to build a new catalog with another tool

To maintain best practices, dbt enforces an input and, by default, writes your tables within a _dbt/{SCHEMA_NAME}/{TABLE_NAME} prefix to ensure easier object-store observability and auditability.

Limitations

There are some limitations to the implementation you need to be aware of:

Using Iceberg tables with dbt, the result is that your query is materialized in Iceberg. However, often, dbt creates intermediary objects as temporary and transient tables for certain materializations, such as incremental ones. It is not possible to configure these temporary objects also to be Iceberg-formatted. You may see non-Iceberg tables created in the logs to support specific materializations, but they will be dropped after usage.
You cannot incrementally update a preexisting incremental model to be an Iceberg table. To do so, you must fully rebuild the table with the --full-refresh flag.

Dynamic tables

The Snowflake adapter supports dynamic tables. This materialization is specific to Snowflake, which means that any model configuration that would normally come along for the ride from dbt-core (e.g. as with a view) may not be available for dynamic tables. This gap will decrease in future patches and versions. While this materialization is specific to Snowflake, it very much follows the implementation of materialized views. In particular, dynamic tables have access to the on_configuration_change setting. Dynamic tables are supported with the following configuration parameters:

Parameter	Type	Required	Default	Change Monitoring Support
`on_configuration_change`	`<string>`	no	`apply`	n/a
`target_lag`	`<string>`	yes		alter
`snowflake_warehouse`	`<string>`	yes		alter
`refresh_mode`	`<string>`	no	`AUTO`	refresh
`initialize`	`<string>`	no	`ON_CREATE`	n/a

Project file
Property file
Config block

dbt_project.yml

models:
  <resource-path>:
    +materialized: dynamic_table
    +on_configuration_change: apply | continue | fail
    +target_lag: downstream | <time-delta>
    +snowflake_warehouse: <warehouse-name>
    +refresh_mode: AUTO | FULL | INCREMENTAL
    +initialize: ON_CREATE | ON_SCHEDULE 

models/properties.yml

version: 2

models:
  - name: [<model-name>]
    config:
      materialized: dynamic_table
      on_configuration_change: apply | continue | fail
      target_lag: downstream | <time-delta>
      snowflake_warehouse: <warehouse-name>
      refresh_mode: AUTO | FULL | INCREMENTAL 
      initialize: ON_CREATE | ON_SCHEDULE 

models/<model_name>.sql

{{ config(
    materialized="dynamic_table",
    on_configuration_change="apply" | "continue" | "fail",
    target_lag="downstream" | "<integer> seconds | minutes | hours | days",
    snowflake_warehouse="<warehouse-name>",
    refresh_mode="AUTO" | "FULL" | "INCREMENTAL",
    initialize="ON_CREATE" | "ON_SCHEDULE", 

) }}

Learn more about these parameters in Snowflake's docs:

Target lag

Snowflake allows two configuration scenarios for scheduling automatic refreshes:

Time-based — Provide a value of the form <int> { seconds | minutes | hours | days }. For example, if the dynamic table needs to be updated every 30 minutes, use target_lag='30 minutes'.
Downstream — Applicable when the dynamic table is referenced by other dynamic tables. In this scenario, target_lag='downstream' allows for refreshes to be controlled at the target, instead of at each layer.

Learn more about target_lag in Snowflake's docs. Please note that Snowflake supports a target lag of 1 minute or longer.

Refresh mode

Snowflake allows three options for refresh mode:

AUTO — Enforces an incremental refresh of the dynamic table by default. If the CREATE DYNAMIC TABLE statement does not support the incremental refresh mode, the dynamic table is automatically created with the full refresh mode.
FULL — Enforces a full refresh of the dynamic table, even if the dynamic table can be incrementally refreshed.
INCREMENTAL — Enforces an incremental refresh of the dynamic table. If the query that underlies the dynamic table can’t perform an incremental refresh, dynamic table creation fails and displays an error message.

Learn more about refresh_mode in Snowflake's docs.

Initialize

Snowflake allows two options for initialize:

ON_CREATE — Refreshes the dynamic table synchronously at creation. If this refresh fails, dynamic table creation fails and displays an error message.
ON_SCHEDULE — Refreshes the dynamic table at the next scheduled refresh.

Learn more about initialize in Snowflake's docs.

Limitations

As with materialized views on most data platforms, there are limitations associated with dynamic tables. Some worth noting include:

Dynamic table SQL has a limited feature set.
Dynamic table SQL cannot be updated; the dynamic table must go through a --full-refresh (DROP/CREATE).
Dynamic tables cannot be downstream from: materialized views, external tables, streams.
Dynamic tables cannot reference a view that is downstream from another dynamic table.

Find more information about dynamic table limitations in Snowflake's docs.

For dbt limitations, these dbt features are not supported:

Troubleshooting dynamic tables

If your dynamic table model fails to rerun with the following error message after the initial execution:

SnowflakeDynamicTableConfig.__init__() missing 6 required positional arguments: 'name', 'schema_name', 'database_name', 'query', 'target_lag', and 'snowflake_warehouse'

Ensure that QUOTED_IDENTIFIERS_IGNORE_CASE on your account is set to FALSE.

Temporary tables

Incremental table merges for Snowflake prefer to utilize a view rather than a temporary table. The reasoning is to avoid the database write step that a temporary table would initiate and save compile time.

However, some situations remain where a temporary table would achieve results faster or more safely. The tmp_relation_type configuration enables you to opt in to temporary tables for incremental builds. This is defined as part of the model configuration.

To guarantee accuracy, an incremental model using the delete+insert strategy with a unique_key defined requires a temporary table; trying to change this to a view will result in an error.

Defined in the project YAML:

dbt_project.yml

name: my_project

...

models:
  <resource-path>:
    +tmp_relation_type: table | view ## If not defined, view is the default.
  

In the configuration format for the model SQL file:

dbt_model.sql

{{ config(
    tmp_relation_type="table | view", ## If not defined, view is the default.
) }}

Transient tables

Snowflake supports the creation of transient tables. Snowflake does not preserve a history for these tables, which can result in a measurable reduction of your Snowflake storage costs. Transient tables participate in time travel to a limited degree with a retention period of 1 day by default with no fail-safe period. Weigh these tradeoffs when deciding whether or not to configure your dbt models as transient. By default, all Snowflake tables created by dbt are transient.

Configuring transient tables in dbt_project.yml

A whole folder (or package) can be configured to be transient (or not) by adding a line to the dbt_project.yml file. This config works just like all of the model configs defined in dbt_project.yml.

dbt_project.yml

name: my_project

...

models:
  +transient: false
  my_project:
    ...

Configuring transience for a specific model

A specific model can be configured to be transient by setting the transient model config to true.

my_table.sql

{{ config(materialized='table', transient=true) }}

select * from ...

Query tags

Query tags are a Snowflake parameter that can be quite useful later on when searching in the QUERY_HISTORY view.

dbt supports setting a default query tag for the duration of its Snowflake connections in your profile. You can set more precise values (and override the default) for subsets of models by setting a query_tag model config or by overriding the default set_query_tag macro:

dbt_project.yml

models:
  <resource-path>:
    +query_tag: dbt_special

models/<modelname>.sql

{{ config(
    query_tag = 'dbt_special'
) }}

select ...

In this example, you can set up a query tag to be applied to every query with the model's name.

  {% macro set_query_tag() -%}
  {% set new_query_tag = model.name %} 
  {% if new_query_tag %}
    {% set original_query_tag = get_current_query_tag() %}
    {{ log("Setting query_tag to '" ~ new_query_tag ~ "'. Will reset to '" ~ original_query_tag ~ "' after materialization.") }}
    {% do run_query("alter session set query_tag = '{}'".format(new_query_tag)) %}
    {{ return(original_query_tag)}}
  {% endif %}
  {{ return(none)}}
{% endmacro %}

Note: query tags are set at the session level. At the start of each model materializationThe exact Data Definition Language (DDL) that dbt will use when creating the model’s equivalent in a data warehouse., if the model has a custom query_tag configured, dbt will run alter session set query_tag to set the new value. At the end of the materialization, dbt will run another alter statement to reset the tag to its default value. As such, build failures midway through a materialization may result in subsequent queries running with an incorrect tag.

Merge behavior (incremental models)

The incremental_strategy config controls how dbt builds incremental models. By default, dbt will use a merge statement on Snowflake to refresh incremental tables.

Snowflake supports the following incremental strategies:

Merge (default)
Append
Delete+insert
Insert_overwrite
- Note: This is not a standard dbt incremental strategy. insert_overwrite behaves like truncate + re-insert commands on Snowflake. It doesn't support partition-based overwrites, which means it'll overwrite the entire table intentionally. It's implemented as an incremental strategy because it aligns with dbt’s workflow of not dropping existing tables.
microbatch

Snowflake's merge statement fails with a "nondeterministic merge" error if the unique_key specified in your model config is not actually unique. If you encounter this error, you can instruct dbt to use a two-step incremental approach by setting the incremental_strategy config for your model to delete+insert.

Configuring table clustering

dbt supports table clustering on Snowflake. To control clustering for a tableIn simplest terms, a table is the direct storage of data in rows and columns. Think excel sheet with raw values in each of the cells. or incremental model, use the cluster_by config. When this configuration is applied, dbt will do two things:

It will implicitly order the table results by the specified cluster_by fields
It will add the specified clustering keys to the target table

By using the specified cluster_by fields to order the table, dbt minimizes the amount of work required by Snowflake's automatic clustering functionality. If an incremental model is configured to use table clustering, then dbt will also order the staged dataset before merging it into the destination table. As such, the dbt-managed table should always be in a mostly clustered state.

Using cluster_by

The cluster_by config accepts either a string, or a list of strings to use as clustering keys. The following example will create a sessions table that is clustered by the session_start column.

models/events/sessions.sql

{{
  config(
    materialized='table',
    cluster_by=['session_start']
  )
}}

select
  session_id,
  min(event_time) as session_start,
  max(event_time) as session_end,
  count(*) as count_pageviews

from {{ source('snowplow', 'event') }}
group by 1

The code above will be compiled to SQL that looks (approximately) like this:

create or replace table my_database.my_schema.my_table as (

  select * from (
    select
      session_id,
      min(event_time) as session_start,
      max(event_time) as session_end,
      count(*) as count_pageviews

    from {{ source('snowplow', 'event') }}
    group by 1
  )

  -- this order by is added by dbt in order to create the
  -- table in an already-clustered manner.
  order by session_start

);

 alter table my_database.my_schema.my_table cluster by (session_start);

Automatic clustering

Automatic clustering is enabled by default in Snowflake today, no action is needed to make use of it. Though there is an automatic_clustering config, it has no effect except for accounts with (deprecated) manual clustering enabled.

If manual clustering is still enabled for your account, you can use the automatic_clustering config to control whether or not automatic clustering is enabled for dbt models. When automatic_clustering is set to true, dbt will run an alter table <table name> resume recluster query after building the target table.

The automatic_clustering config can be specified in the dbt_project.yml file, or in a model config() block.

dbt_project.yml

models:
  +automatic_clustering: true

Configuring virtual warehouses

The default warehouse that dbt uses can be configured in your Profile for Snowflake connections. To override the warehouse that is used for specific models (or groups of models), use the snowflake_warehouse model configuration. This configuration can be used to specify a larger warehouse for certain models in order to control Snowflake costs and project build times.

YAML code
SQL code

The example config below changes the warehouse for a group of models with a config argument in the yml.

dbt_project.yml

name: my_project
version: 1.0.0

...

models:
  +snowflake_warehouse: "EXTRA_SMALL"    # use the `EXTRA_SMALL` warehouse for all models in the project...
  my_project:
    clickstream:
      +snowflake_warehouse: "EXTRA_LARGE"    # ...except for the models in the `clickstream` folder, which will use the `EXTRA_LARGE` warehouse.

snapshots:
  +snowflake_warehouse: "EXTRA_LARGE"    # all Snapshot models are configured to use the `EXTRA_LARGE` warehouse.

The example config below changes the warehouse for a single model with a config() block in the sql model.

models/events/sessions.sql

{{
  config(
    materialized='table',
    snowflake_warehouse='EXTRA_LARGE'
  )
}}

with

aggregated_page_events as (

    select
        session_id,
        min(event_time) as session_start,
        max(event_time) as session_end,
        count(*) as count_page_views
    from {{ source('snowplow', 'event') }}
    group by 1

),

index_sessions as (

    select
        *,
        row_number() over (
            partition by session_id
            order by session_start
        ) as page_view_in_session_index
    from aggregated_page_events

)

select * from index_sessions

Copying grants

When the copy_grants config is set to true, dbt will add the copy grants DDLData Definition Language (DDL) is a group of SQL statements that you can execute to manage database objects, including tables, views, and more. qualifier when rebuilding tables and viewsA view (as opposed to a table) is a defined passthrough SQL query that can be run against a database (or data warehouse).. The default value is false.

dbt_project.yml

models:
  +copy_grants: true

Secure views

To create a Snowflake secure view, use the secure config for view models. Secure views can be used to limit access to sensitive data. Note: secure views may incur a performance penalty, so you should only use them if you need them.

The following example configures the models in the sensitive/ folder to be configured as secure views.

dbt_project.yml

name: my_project
version: 1.0.0

models:
  my_project:
    sensitive:
      +materialized: view
      +secure: true

Source freshness known limitation

Snowflake calculates source freshness using information from the LAST_ALTERED column, meaning it relies on a field updated whenever any object undergoes modification, not only data updates. No action must be taken, but analytics teams should note this caveat.

Per the Snowflake documentation:

The LAST_ALTERED column is updated when the following operations are performed on an object:

DDL operations.

DML operations (for tables only).

Background maintenance operations on metadata performed by Snowflake.

Pagination for object results

By default, when dbt encounters a schema with up to 100,000 objects, it will paginate the results from show objects at 10,000 per page for up to 10 pages.

Environments with more than 100,000 objects in a schema can customize the number of results per page and the page limit using the following flags in the dbt_project.yml:

list_relations_per_page — The number of relations on each page (Max 10k as this is the most Snowflake allows).
list_relations_page_limit — The maximum number of pages to include in the results.

For example, if you wanted to include 10,000 objects per page and include up to 100 pages (1 million objects), configure the flags as follows:

flags:
  list_relations_per_page: 10000
  list_relations_page_limit: 100

Iceberg table format​

Example configuration​

Base location​

Rationale​

Limitations​

Dynamic tables​

Target lag​

Refresh mode​

Initialize​

Limitations​

Troubleshooting dynamic tables​

Temporary tables​

Transient tables​

Configuring transient tables in dbt_project.yml​

Configuring transience for a specific model​

Query tags​

Merge behavior (incremental models)​

Configuring table clustering​

Using cluster_by​

Automatic clustering​

Configuring virtual warehouses​

Copying grants​

Secure views​

Source freshness known limitation​

Pagination for object results​

Iceberg table format

Example configuration

Base location

Rationale

Limitations

Dynamic tables

Target lag

Refresh mode

Initialize

Limitations

Troubleshooting dynamic tables

Temporary tables

Transient tables

Configuring transient tables in dbt_project.yml

Configuring transience for a specific model

Query tags

Merge behavior (incremental models)

Configuring table clustering

Using cluster_by

Automatic clustering

Configuring virtual warehouses

Copying grants

Secure views

Source freshness known limitation

Pagination for object results