Skip to main content

Databricks setup

Overview of dbt-databricks

  • Maintained by: Databricks
  • Authors: some dbt loving Bricksters
  • GitHub repo: databricks/dbt-databricks
  • PyPI package: dbt-databricks
  • Slack channel: #db-databricks-and-spark
  • Supported dbt Core version: v0.18.0 and newer
  • dbt Cloud support: Supported
  • Minimum data platform version: n/a

Installation and Distribution

Installing dbt-databricks

pip is the easiest way to install the adapter:

pip install dbt-databricks

Installing dbt-databricks will also install dbt-core and any other dependencies.

Configuring dbt-databricks

For Databricks-specifc configuration please refer to Databricks Configuration

For further info, refer to the GitHub repository: databricks/dbt-databricks

Set up a Databricks Target

dbt-databricks can connect to Databricks all-purpose clusters as well as SQL endpoints. The latter provides an opinionated way of running SQL workloads with optimal performance and price; the former provides all the flexibility of Spark.

~/.dbt/profiles.yml
your_profile_name:
target: dev
outputs:
dev:
type: databricks
catalog: [optional catalog name, if you are using Unity Catalog, is only available in dbt-databricks>=1.1.1]
schema: [schema name]
host: [yourorg.databrickshost.com]
http_path: [/sql/your/http/path]
token: [dapiXXXXXXXXXXXXXXXXXXXXXXX] # Personal Access Token (PAT)
threads: [1 or more] # optional, default 1

See the Databricks documentation on how to obtain the credentials for configuring your profile.

Caveats

Supported Functionality

Most dbt Core functionality is supported, but some features are only available on Delta Lake.

Delta-only features:

  1. Incremental model updates by unique_key instead of partition_by (see merge strategy)
  2. Snapshots

Choosing between dbt-databricks and dbt-spark

While dbt-spark can be used to connect to Databricks, dbt-databricks was created to make it even easier to use dbt with the Databricks Lakehouse.

dbt-databricks includes:

  • No need to install additional drivers or dependencies for use on the CLI
  • Use of Delta Lake for all models out of the box
  • SQL macros that are optimized to run with Photon

Support for Unity Catalog

The adapter dbt-databricks>=1.1.1 supports the 3-level namespace of Unity Catalog (catalog / schema / relations) so you can organize and secure your data the way you like.

0