Skip to main content

Databricks and Apache Iceberg

Databricks is built on Delta Lake and stores data in the Delta table format. Databricks does not support writing to Iceberg catalogs. Databricks can create both managed Iceberg tables and Iceberg-compatible Delta tables by storing the table metadata in Iceberg and Delta, readable from external clients. In terms of reading, Unity Catalog does support reading from external Iceberg catalogs.

When a dbt model is configured with the table property UniForm, it will duplicate the Delta metadata for an Iceberg-compatible metadata. This allows external Iceberg compute engines to read from Unity Catalogs.

Example SQL:

{{ config(
tblproperties={
'delta.enableIcebergCompatV2': 'true'
'delta.universalFormat.enabledFormats': 'iceberg'
}
) }}

To set up Databricks for reading and querying external tables, configure Lakehouse Federation and establish the catalog as a foreign catalog. This will be configured outside of dbt, and once completed, it will be another database you can query.

We do not currently support the new Private Priview features of Databricks managed Iceberg tables.

dbt Catalog Integration Configurations for Databricks

The following table outlines the configuration fields required to set up a catalog integration for Iceberg compatible tables in Databricks.

FieldDescriptionRequiredAccepted values
nameName of the Catalog on DatabricksYes“my_unity_catalog”
catalog_typeType of catalogYesunity, hive_metastore
external_volumeStorage location of your dataOptionalSee Databricks documentation
table_formatTable Format for your dbt models will be materialized asOptionalDefaults to delta unless overwritten in Databricks account.
adapter_properties:Additional Platform-Specific Properties.OptionalSee below for acceptable values

Adapter Properties

These are the additional configurations that can be supplied and nested under adapter_properties to add in more configurability.

FieldDescriptionRequiredAccepted values
table_formatTable Format for your dbt models will be materialized asOptionalDefaults to delta unless overwritten in Databricks account.
adapter_properties:Additional Platform-Specific Properties.OptionalSee below for acceptable values

Example:

adapter_properties:
file_format: parquet

Configure catalog integration for managed Iceberg tables

  1. Create a catalogs.yml at the top level of your dbt project (at the same level as dbt_project.yml)

    An example of Unity Catalog as the catalog:

catalogs:
- name: unity_catalog
active_write_integration: unity_catalog_integration
write_integrations:
- name: unity_catalog_integration
table_format: iceberg
catalog_type: unity
adapter_properties:
file_format: parquet

  1. Add the catalog_name config parameter in either the SQL config (inside the .sql model file), property file (model folder), or your dbt_project.yml.

An example of iceberg_model.sql:


{{
config(
materialized = 'table',
catalog_name = 'unity_catalog'

)
}}

select * from {{ ref('jaffle_shop_customers') }}

  1. Execute the dbt model with a dbt run -s iceberg_model.

Was this page helpful?

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

0