Snowflake and Apache Iceberg
dbt supports materializing the table in Iceberg table format in two different ways:
- The model configuration field
table_format = 'iceberg'
(legacy) - Catalog integration can be configured in the SQL config (inside the
.sql
model file), property file (model folder), or project file (dbt_project.yml
)
You need to create a catalogs.yml
file to use the integration and apply that integration on the config level.
Refer to Snowflake configurations for more information.
We recommend using the Iceberg catalog configuration and applying the catalog in the model config for ease of use and to future-proof your code. Using table_format = 'iceberg'
directly on the model configuration is a legacy approach and limits usage to just Snowflake Horizon as the catalog. Catalog support is available on dbt 1.10+
Creating Iceberg Tables
dbt supports creating Iceberg tables for three of the Snowflake materializations:
Iceberg catalogs
Snowflake has support for Iceberg tables via built-in and external catalogs, including:
- Snowflake Horizon (the built-in catalog)
- Polaris/Open Catalog (managed Polaris)
- Glue Data Catalog (Not supported in dbt-snowflake)
- Iceberg REST Compatible
dbt supports the Snowflake built-in catalog and Iceberg REST-compatible catalogs (including Polaris and Unity Catalog) on dbt-snowflake.
To use an externally managed catalog (anything outside of the built-in catalog), you must set up a catalog integration. To do so, you must run a SQL command similar to the following.
External catalogs
Example configurations for external catalogs.
- Polaris/Open Catalog
- Glue data catalog
- Iceberg REST API
You must set up a catalog integration to use Polaris/Open Catalog (managed Polaris).
Example code:
CREATE CATALOG INTEGRATION my_polaris_catalog_int
CATALOG_SOURCE = POLARIS
TABLE_FORMAT = ICEBERG
REST_CONFIG = (
CATALOG_URI = 'https://<org>-<account>.snowflakecomputing.com/polaris/api/catalog'
CATALOG_NAME = '<open_catalog_name>'
)
REST_AUTHENTICATION = (
TYPE = OAUTH
OAUTH_CLIENT_ID = '<client_id>'
OAUTH_CLIENT_SECRET = '<client_secret>'
OAUTH_ALLOWED_SCOPES = ('PRINCIPAL_ROLE:ALL')
)
ENABLED = TRUE;
Executing this will register the external Polaris catalog with Snowflake. Once configured, dbt can create Iceberg tables in Snowflake that register the existence of the new database object with the catalog as metadata and query Polaris-managed tables.
To configure Glue Data Catalog as the external catalog, you will need to set up two prerequisites:
-
Create AWS IAM Role for Glue Access: Configure AWS permissions so Snowflake can read the Glue Catalog. This typically means creating an AWS IAM role that Snowflake will assume, with policies allowing Glue catalog read operations (at minimum,
glue:GetTable
andglue:GetTables
on the relevant Glue databases). Attach a trust policy to enable Snowflake to assume this role (via an external ID). -
Set up the catalog integration: In Snowflake, create a catalog integration of type GLUE. This registers the Glue Data Catalog information and the IAM role with Snowflake. For example:
CREATE CATALOG INTEGRATION my_glue_catalog_int
CATALOG_SOURCE = GLUE
CATALOG_NAMESPACE = 'dbt_database'
TABLE_FORMAT = ICEBERG
GLUE_AWS_ROLE_ARN = 'arn:aws:iam::123456789012:role/myGlueRole'
GLUE_CATALOG_ID = '123456789012'
GLUE_REGION = 'us-east-2'
ENABLED = TRUE;
Glue Data Catalog supports the Iceberg REST specification so that you can connect to Glue via the Iceberg REST API.
You can set up an integration for your catalogs that are compatible with the open-source Apache Iceberg REST specification,
Example code:
CREATE CATALOG INTEGRATION my_iceberg_catalog_int
CATALOG_SOURCE = ICEBERG_REST
TABLE_FORMAT = ICEBERG
CATALOG_NAMESPACE = 'dbt_database'
REST_CONFIG = (
restConfigParams
)
REST_AUTHENTICATION = (
restAuthenticationParams
)
ENABLED = TRUE
REFRESH_INTERVAL_SECONDS = <value>
COMMENT = 'catalog integration for dbt iceberg tables'
For Unity Catalog with a bearer token :
CREATE OR REPLACE CATALOG INTEGRATION my_unity_catalog_int_pat
CATALOG_SOURCE = ICEBERG_REST
TABLE_FORMAT = ICEBERG
CATALOG_NAMESPACE = 'my_namespace'
REST_CONFIG = (
CATALOG_URI = 'https://my-api/api/2.1/unity-catalog/iceberg'
CATALOG_NAME= '<catalog_name>'
)
REST_AUTHENTICATION = (
TYPE = BEARER
BEARER_TOKEN = '<bearer_token>'
)
ENABLED = TRUE;
After you have created the external catalog integration, you will be able to do two things:
-
Query an externally managed table: Snowflake can query Iceberg tables whose metadata lives in the external catalog. In this scenario, Snowflake is a "reader" of the external catalog. The table’s data remains in external cloud storage (AWS S3 or GCP Bucket) as defined in the catalog storage configuration. Snowflake will use the catalog integration to fetch metadata via the REST API. Snowflake then reads the data files from cloud storage.
-
Sync Snowflake-managed tables to an external catalog: You can create a Snowflake Iceberg table that Snowflake manages via a cloud storage location and then register/sync that table to the external catalog. This allows other engines to discover the table.
dbt Catalog Integration Configurations for Snowflake
The following table outlines the configuration fields required to set up a catalog integration for Iceberg tables in Snowflake.
Field | Required | Accepted values |
---|---|---|
name | yes | Name of catalog integration |
catalog_name | yes | The name of the catalog integration in Snowflake. For example, my_dbt_iceberg_catalog ) |
external_volume | yes | <external_volume_name> |
table_format | yes | iceberg |
catalog_type | yes | built_in , iceberg_rest |
adapter_properties | optional | See below |
You can connect to external Iceberg-compatible catalogs, such as Polaris and Unity Catalog, via the Iceberg REST catalog_type
. Please note that we only support Iceberg REST with Catalog Linked Databases.
Adapter Properties
These are the additional configurations, unique to Snowflake, that can be supplied and nested under adapter_properties
.
Field | Required | Accepted Values |
---|---|---|
storage_serialization_policy | Optional | COMPATIBLE or OPTIMIZED |
max_data_extension_time_in_days | Optional | 0 to 90 with a default of 14 |
data_retention_time_in_days | Optional | Standard Account: 1 , Enterprise or higher: 0 to 90 , default 1 |
change_tracking | Optional | True or False |
catalog_linked_database | Required if you are using the iceberg_rest catalog type . | catalog linked database name. |
- storage_serialization_policy: The serialization policy tells Snowflake what kind of encoding and compression to perform on the table data files. If not specified at table creation, the table inherits the value set at the schema, database, or account level. If the value isn’t specified at any level, the table uses the default value. You can’t change the value of this parameter after table creation.
- max_data_extension_time_in_days: The maximum number of days Snowflake can extend the data retention period for tables to prevent streams on the tables from becoming stale. The
MAX_DATA_EXTENSION_TIME_IN_DAYS
parameter enables you to limit this automatic extension period to control storage costs for data retention, or for compliance reasons. - data_retention_time_in_days: For managed Iceberg tables, you can set a retention period for Snowflake Time Travel and undropping the table over the default account values. For tables that use an external catalog, Snowflake uses the value of the DATA_RETENTION_TIME_IN_DAYS parameter to set a retention period for Snowflake Time Travel and undropping the table. When the retention period expires, Snowflake does not delete the Iceberg metadata or snapshots from your external cloud storage.
- change_tracking: Specifies whether to enable change tracking on the table.
- catalog_linked_database: Catalog-linked databases (CLD) in Snowflake ensures that Snowflake can automatically sync metadata (including namespaces and iceberg tables) from the external Iceberg Catalog and registers them as remote tables in the catalog-linked database. The reason we require the usage of Catalog-linked databases for building Iceberg tables with external catalogs is that without it, dbt will be unable to truly manage the table end-to-end. Snowflake does not support dropping the Iceberg table on non-CLDs in the external catalog; instead, it only allows unlinking the Snowflake table, which creates a discrepancy with how dbt expects to manage the materialized object.
Configure catalog integration for managed Iceberg tables
- Create a
catalogs.yml
at the top level of your dbt project.
An example of Snowflake Horizon as the catalog:
catalogs:
- name: catalog_horizon
active_write_integration: snowflake_write_integration
write_integrations:
- name: snowflake_write_integration
external_volume: dbt_external_volume
table_format: iceberg
catalog_type: built_in
adapter_properties:
change_tracking: 'True'
- Add the
catalog_name
config parameter in either the SQL config (inside the .sql model file), property file (model folder), or yourdbt_project.yml
.
An example oficeberg_model.sql
:
{{
config(
materialized='table',
catalog_name = catalog_horizon
)
}}
select * from {{ ref('jaffle_shop_customers') }}
- Execute the dbt model with a
dbt run -s iceberg_model
.
For more information, refer to our documentation on Snowflake configurations.
Limitations
For external catalogs, Snowflake only supports read
, which means it can query the table but cannot insert or modify data.
The syncing experience will be different depending on the catalog you choose. Some catalogs are automatically refreshed, and you can set parameters to do so with your catalog integration. Other catalogs might require a separate job to manage the metadata sync.
Was this page helpful?
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.