Databricks setup

profiles.yml file is for dbt Core and dbt fusion only

If you're using dbt platform, you don't need to create a profiles.yml file. This file is only necessary when you use dbt Core or dbt Fusion locally. To learn more about Fusion prerequisites, refer to About Fusion installation. To connect your data platform to dbt, refer to About data platforms.

Maintained by: Databricks
Authors: some dbt loving Bricksters
GitHub repo: databricks/dbt-databricks
PyPI package: dbt-databricks
Slack channel: #db-databricks-and-spark
Supported dbt Core version: v0.18.0 and newer
dbt support: Supported
Minimum data platform version: Databricks SQL or DBR 12+

Installing dbt-databricks

Use pip to install the adapter. Before 1.8, installing the adapter would automatically install dbt-core and any additional dependencies. Beginning in 1.8, installing an adapter does not automatically install dbt-core. This is because adapters and dbt Core versions have been decoupled from each other so we no longer want to overwrite existing dbt-core installations. Use the following command for installation:

python -m pip install dbt-core dbt-databricks

Configuring dbt-databricks

For Databricks-specific configuration, please refer to Databricks configs.

dbt-databricks is the recommended adapter for Databricks. It includes features not available in dbt-spark, such as:

Unity Catalog support
No need to install additional drivers or dependencies for use on the CLI
Use of Delta Lake for all models out of the box
SQL macros that are optimized to run with Photon

Connecting to Databricks

To connect to a data platform with dbt Core, create the appropriate profile and target YAML keys/values in the profiles.yml configuration file for your Databricks SQL Warehouse/cluster. This dbt YAML file lives in the .dbt/ directory of your user/home directory. For more info, refer to Connection profiles and profiles.yml.

dbt-databricks can connect to Databricks SQL Warehouses and all-purpose clusters. Databricks SQL Warehouses is the recommended way to get started with Databricks.

Refer to the Databricks docs for more info on how to obtain the credentials for configuring your profile.

Examples

You can use either token-based authentication or OAuth client-based authentication to connect to Databricks. Refer to the following examples for more on how to configure your profile for each type of authentication.

The default OAuth app for dbt-databricks is auto-enabled in every account with expected settings. You can find the adapter app in Account Console > Settings > App Connections > dbt adapter for Databricks. If you cannot find the adapter app, dbt may be disabled in your account, please refer to this guide to re-enable dbt-databricks as an OAuth app.

Token-based authentication
OAuth client-based authentication (M2M)
OAuth client-based authentication (U2M)

~/.dbt/profiles.yml

your_profile_name:
  target: dev
  outputs:
    dev:
      type: databricks
      catalog: CATALOG_NAME #optional catalog name if you are using Unity Catalog]
      schema: SCHEMA_NAME # Required
      host: YOURORG.databrickshost.com # Required
      http_path: /SQL/YOUR/HTTP/PATH # Required
      token: dapiXXXXXXXXXXXXXXXXXXXXXXX # Required Personal Access Token (PAT) if using token-based authentication
      threads: 1_OR_MORE  # Optional, default 1

~/.dbt/profiles.yml

your_profile_name:
  target: dev
  outputs:
    dev:
      type: databricks
      catalog: CATALOG_NAME #optional catalog name if you are using Unity Catalog
      schema: SCHEMA_NAME # Required
      host: YOUR_ORG.databrickshost.com # Required
      http_path: /SQL/YOUR/HTTP/PATH # Required
      auth_type: oauth # Required if using OAuth-based authentication
      client_id: OAUTH_CLIENT_ID # The ID of your OAuth application. Required if using OAuth-based authentication. Key should be azure_client_id for Azure Databricks.
      client_secret: XXXXXXXXXXXXXXXXXXXXXXXXXXX # OAuth client secret. Required if using OAuth-based authentication. Key should be azure_client_secret for Azure Databricks.
      threads: 1_OR_MORE  # Optional, default 1

~/.dbt/profiles.yml

your_profile_name:
  target: dev
  outputs:
    dev:
      type: databricks
      catalog: CATALOG_NAME #optional catalog name if you are using Unity Catalog
      schema: SCHEMA_NAME # Required
      host: YOUR_ORG.databrickshost.com # Required
      http_path: /SQL/YOUR/HTTP/PATH # Required
      auth_type: oauth # Required if using OAuth-based authentication
      threads: 1_OR_MORE  # Optional, default 1

Host parameters

The following profile fields are always required.

Field	Description	Example
`host`	The hostname of your cluster. Don't include the `http://` or `https://` prefix.	`YOURORG.databrickshost.com`
`http_path`	The http path to your SQL Warehouse or all-purpose cluster.	`/SQL/YOUR/HTTP/PATH`
`schema`	The name of a schema within your cluster's catalog. It's not recommended to use schema names that have upper case or mixed case letters.	`MY_SCHEMA`

Authentication parameters

The dbt-databricks adapter supports both token-based authentication and OAuth client-based authentication.

Refer to the following required parameters to configure your profile for each type of authentication:

Field	Authentication type	Description	Example
`token`	Token-based	The Personal Access Token (PAT) to connect to Databricks.	`dapiXXXXXXXXX` `XXXXXXXXXXXXXX`
`client_id`	OAuth-based (AWS/GCP)	The client ID for your Databricks OAuth application	`OAUTH_CLIENT_ID`
`client_secret`	OAuth-based (AWS/GCP)	The client secret for your Databricks OAuth application.	`XXXXXXXXXXXXX` `XXXXXXXXXXXXXX`
`azure_client_id`	OAuth-based (Azure)	The client ID for your Azure Databricks OAuth application.	`AZURE_CLIENT_ID`
`azure_client_secret`	OAuth-based (Azure)	The client secret for your Azure Databricks OAuth application.	`XXXXXXXXXXXXX` `XXXXXXXXXXXXXX`
`auth_type`	OAuth-based	The type of authorization needed to connect to Databricks.	`oauth`

Additional parameters

The following profile fields are optional to set up. They help you configure how your cluster's session and dbt work for your connection.

Profile field	Description	Example
`threads`	The number of threads dbt should use (default is `1`)	`8`
`connect_retries`	The number of times dbt should retry the connection to Databricks (default is `1`)	`3`
`connect_timeout`	How many seconds before the connection to Databricks should timeout (default behavior is no timeouts)	`1000`
`session_properties`	This sets the Databricks session properties used in the connection. Execute `SET -v` to see available options	`ansi_mode: true`

Supported Functionality

Delta Lake

Most dbt Core functionality is supported, but some features are only available on Delta Lake.

Delta-only features:

Incremental model updates by unique_key instead of partition_by (see merge strategy)
Snapshots

Unity Catalog

The adapter dbt-databricks>=1.1.1 supports the 3-level namespace of Unity Catalog (catalog / schema / relations) so you can organize and secure your data the way you like.

Was this page helpful?

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Databricks setup

Installing dbt-databricks

Configuring dbt-databricks

Connecting to Databricks

Examples

Host parameters

Authentication parameters

Additional parameters

Supported Functionality

Delta Lake

Unity Catalog

Was this page helpful?

Start building with dbt.

Resources

Community

Support

Connect with Us

Installing dbt-databricks

Configuring dbt-databricks

Connecting to Databricks​

Examples​

Host parameters​

Authentication parameters​

Additional parameters​

Supported Functionality​

Delta Lake​

Unity Catalog​

Was this page helpful?

Resources

Community

Support

Connect with Us

Connecting to Databricks

Examples

Host parameters

Authentication parameters

Additional parameters

Supported Functionality

Delta Lake

Unity Catalog