Skip to main content

Snowflake and Apache Iceberg

dbt supports materializing the table in Iceberg table format in two different ways:

  • The model configuration field table_format = 'iceberg' (legacy)
  • Catalog integration can be configured in the SQL config (inside the .sql model file), property file (model folder), or project file (dbt_project.yml)
Catalog integration configuration

You need to create a catalogs.yml file to use the integration and apply that integration on the config level.

Refer to Snowflake configurations for more information.

We recommend using the Iceberg catalog configuration and applying the catalog in the model config for ease of use and to future-proof your code. Using table_format = 'iceberg' directly on the model configuration is a legacy approach and limits usage to just Snowflake Horizon as the catalog. Catalog support is available on dbt 1.10+

Creating Iceberg Tables

dbt supports creating Iceberg tables for three of the Snowflake materializations:

Iceberg catalogs

Snowflake has support for Iceberg tables via built-in and external catalogs, including:

  • Snowflake Horizon (the built-in catalog)
  • Polaris/Open Catalog (managed Polaris)
  • Glue Data Catalog (Not supported in dbt-snowflake)
  • Iceberg REST Compatible

dbt supports the Snowflake built-in catalog and Iceberg REST-compatible catalogs (including Polaris and Unity Catalog) on dbt-snowflake.

To use an externally managed catalog (anything outside of the built-in catalog), you must set up a catalog integration. To do so, you must run a SQL command similar to the following.

External catalogs

Example configurations for external catalogs.

You must set up a catalog integration to use Polaris/Open Catalog (managed Polaris).

Example code:


CREATE CATALOG INTEGRATION my_polaris_catalog_int
CATALOG_SOURCE = POLARIS
TABLE_FORMAT = ICEBERG
REST_CONFIG = (
CATALOG_URI = 'https://<org>-<account>.snowflakecomputing.com/polaris/api/catalog'
CATALOG_NAME = '<open_catalog_name>'
)
REST_AUTHENTICATION = (
TYPE = OAUTH
OAUTH_CLIENT_ID = '<client_id>'
OAUTH_CLIENT_SECRET = '<client_secret>'
OAUTH_ALLOWED_SCOPES = ('PRINCIPAL_ROLE:ALL')
)
ENABLED = TRUE;

Executing this will register the external Polaris catalog with Snowflake. Once configured, dbt can create Iceberg tables in Snowflake that register the existence of the new database object with the catalog as metadata and query Polaris-managed tables.

After you have created the external catalog integration, you will be able to do two things:

  • Query an externally managed table: Snowflake can query Iceberg tables whose metadata lives in the external catalog. In this scenario, Snowflake is a "reader" of the external catalog. The table’s data remains in external cloud storage (AWS S3 or GCP Bucket) as defined in the catalog storage configuration. Snowflake will use the catalog integration to fetch metadata via the REST API. Snowflake then reads the data files from cloud storage.

  • Sync Snowflake-managed tables to an external catalog: You can create a Snowflake Iceberg table that Snowflake manages via a cloud storage location and then register/sync that table to the external catalog. This allows other engines to discover the table.

dbt Catalog Integration Configurations for Snowflake

The following table outlines the configuration fields required to set up a catalog integration for Iceberg tables in Snowflake.

FieldRequiredAccepted values
nameyesName of catalog integration
catalog_nameyesThe name of the catalog integration in Snowflake. For example, my_dbt_iceberg_catalog)
external_volumeyes<external_volume_name>
table_formatyesiceberg
catalog_typeyesbuilt_in, iceberg_rest
adapter_propertiesoptionalSee below

You can connect to external Iceberg-compatible catalogs, such as Polaris and Unity Catalog, via the Iceberg REST catalog_type. Please note that we only support Iceberg REST with Catalog Linked Databases.

Adapter Properties

These are the additional configurations, unique to Snowflake, that can be supplied and nested under adapter_properties.

FieldRequiredAccepted Values
storage_serialization_policyOptionalCOMPATIBLE or OPTIMIZED
max_data_extension_time_in_daysOptional0 to 90 with a default of 14
data_retention_time_in_daysOptionalStandard Account: 1, Enterprise or higher: 0 to 90, default 1
change_trackingOptionalTrue or False
catalog_linked_databaseRequired if you are using the iceberg_rest catalog type.catalog linked database name.
  • storage_serialization_policy: The serialization policy tells Snowflake what kind of encoding and compression to perform on the table data files. If not specified at table creation, the table inherits the value set at the schema, database, or account level. If the value isn’t specified at any level, the table uses the default value. You can’t change the value of this parameter after table creation.
  • max_data_extension_time_in_days: The maximum number of days Snowflake can extend the data retention period for tables to prevent streams on the tables from becoming stale. The MAX_DATA_EXTENSION_TIME_IN_DAYS parameter enables you to limit this automatic extension period to control storage costs for data retention, or for compliance reasons.
  • data_retention_time_in_days: For managed Iceberg tables, you can set a retention period for Snowflake Time Travel and undropping the table over the default account values. For tables that use an external catalog, Snowflake uses the value of the DATA_RETENTION_TIME_IN_DAYS parameter to set a retention period for Snowflake Time Travel and undropping the table. When the retention period expires, Snowflake does not delete the Iceberg metadata or snapshots from your external cloud storage.
  • change_tracking: Specifies whether to enable change tracking on the table.
  • catalog_linked_database: Catalog-linked databases (CLD) in Snowflake ensures that Snowflake can automatically sync metadata (including namespaces and iceberg tables) from the external Iceberg Catalog and registers them as remote tables in the catalog-linked database. The reason we require the usage of Catalog-linked databases for building Iceberg tables with external catalogs is that without it, dbt will be unable to truly manage the table end-to-end. Snowflake does not support dropping the Iceberg table on non-CLDs in the external catalog; instead, it only allows unlinking the Snowflake table, which creates a discrepancy with how dbt expects to manage the materialized object.

Configure catalog integration for managed Iceberg tables

  1. Create a catalogs.yml at the top level of your dbt project.

    An example of Snowflake Horizon as the catalog:

catalogs:
- name: catalog_horizon
active_write_integration: snowflake_write_integration
write_integrations:
- name: snowflake_write_integration
external_volume: dbt_external_volume
table_format: iceberg
catalog_type: built_in
adapter_properties:
change_tracking: 'True'

  1. Add the catalog_name config parameter in either the SQL config (inside the .sql model file), property file (model folder), or your dbt_project.yml.

    An example of iceberg_model.sql:

{{
config(
materialized='table',
catalog_name = catalog_horizon

)
}}

select * from {{ ref('jaffle_shop_customers') }}

  1. Execute the dbt model with a dbt run -s iceberg_model.

For more information, refer to our documentation on Snowflake configurations.

Limitations

For external catalogs, Snowflake only supports read, which means it can query the table but cannot insert or modify data.

The syncing experience will be different depending on the catalog you choose. Some catalogs are automatically refreshed, and you can set parameters to do so with your catalog integration. Other catalogs might require a separate job to manage the metadata sync.

Was this page helpful?

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

0