Databricks setup
Overview of dbt-databricks
- Maintained by: Databricks
- Authors: some dbt loving Bricksters
- GitHub repo: databricks/dbt-databricks
- PyPI package:
dbt-databricks
- Slack channel: #db-databricks-and-spark
- Supported dbt Core version: v0.18.0 and newer
- dbt Cloud support: Supported
- Minimum data platform version: n/a
Installation and Distribution
Installing dbt-databricks
pip is the easiest way to install the adapter:
pip install dbt-databricks
Installing dbt-databricks
will also install dbt-core
and any other dependencies.
Configuring dbt-databricks
For Databricks-specifc configuration please refer to Databricks Configuration
For further info, refer to the GitHub repository: databricks/dbt-databricks
Set up a Databricks Target
dbt-databricks can connect to Databricks all-purpose clusters as well as SQL endpoints. The latter provides an opinionated way of running SQL workloads with optimal performance and price; the former provides all the flexibility of Spark.
your_profile_name:
target: dev
outputs:
dev:
type: databricks
catalog: [optional catalog name, if you are using Unity Catalog, is only available in dbt-databricks>=1.1.1]
schema: [schema name]
host: [yourorg.databrickshost.com]
http_path: [/sql/your/http/path]
token: [dapiXXXXXXXXXXXXXXXXXXXXXXX] # Personal Access Token (PAT)
threads: [1 or more] # optional, default 1
See the Databricks documentation on how to obtain the credentials for configuring your profile.
Caveats
Supported Functionality
Most dbt Core functionality is supported, but some features are only available on Delta Lake.
Delta-only features:
- Incremental model updates by
unique_key
instead ofpartition_by
(seemerge
strategy) - Snapshots
Choosing between dbt-databricks and dbt-spark
While dbt-spark
can be used to connect to Databricks, dbt-databricks
was created to make it
even easier to use dbt with the Databricks Lakehouse.
dbt-databricks
includes:
- No need to install additional drivers or dependencies for use on the CLI
- Use of Delta Lake for all models out of the box
- SQL macros that are optimized to run with Photon
Support for Unity Catalog
The adapter dbt-databricks>=1.1.1
supports the 3-level namespace of Unity Catalog (catalog / schema / relations) so you can organize and secure your data the way you like.