Overview of dbt-databricks​

Maintained by: some dbt loving Bricksters
Author: Databricks
Source: Github
dbt Cloud: Coming Soon
Installation and Distribution​

The easiest way to install dbt-databricks is to use pip:

pip install dbt-databricks

Set up a Databricks Target​

dbt-databricks can connect to Databricks all-purpose clusters as well as SQL endpoints. The latter provides an opinionated way of running SQL workloads with optimal performance and price, the former provides all the flexibility of Spark.

target: dev
type: databricks
schema: [schema name]
host: []
http_path: [/sql/your/http/path]
token: [dapiXXXXXXXXXXXXXXXXXXXXXXX] # Personal Access Token (PAT)

See the Databricks documentation on how to obtain the credentials for configuring your profile.


Supported Functionality​

Most dbt Core functionality is supported, but some features are only available on Delta Lake.

Delta-only features:

  1. Incremental model updates by unique_key instead of partition_by (see merge strategy)
  2. Snapshots

Choosing between dbt-databricks and dbt-spark​

While dbt-spark can be used to connect to Databricks, dbt-databricks was created to make it even easier to use dbt with the Databricks Lakehouse.

dbt-databricks includes:

  • No need to install additional drivers or dependencies for use on the CLI
  • Use of Delta Lake for all models out of the box
  • SQL macros that are optimzed to run with Photon