Migrating from dbt-spark to dbt-databricks
You can migrate your projects from using the
dbt-spark adapter to using the dbt-databricks adapter. In collaboration with dbt Labs, Databricks built this adapter using dbt-spark as the foundation and added some critical improvements. With it, you get an easier set up — requiring only three inputs for authentication — and more features such as support for Unity Catalog.
Previously, you had to provide a
endpoint ID which was hard to parse from the
http_path that you were given. Now, it doesn't matter if you're using a cluster or an SQL endpoint because the dbt-databricks setup requires the same inputs for both. All you need to provide is:
- hostname of the Databricks workspace
- HTTP path of the Databricks SQL warehouse or cluster
- appropriate credentials
dbt-databricks adapter provides better defaults than
dbt-spark does. The defaults help optimize your workflow so you can get the fast performance and cost-effectiveness of Databricks. They are:
- The dbt models use the Delta table format. You can remove any declared configurations of
file_format = 'delta'since they're now redundant.
- Accelerate your expensive queries with the Photon engine.
incremental_strategyconfig is set to
With dbt-spark, however, the default for
append. If you want to continue using
incremental_strategy=append, you must set this config specifically on your incremental models. If you already specified
incremental_strategy=merge on your incremental models, you don't need to change anything when moving to dbt-databricks; but, you can keep your models clean (tidy) by removing the config since it's redundant. Read About incremental_strategy to learn more.
For more information on defaults, see Caveats.
If you use dbt Core, you no longer have to download an independent driver to interact with Databricks. The connection information is all embedded in a pure-Python library called
Migrate your dbt projects
In both dbt Core and dbt Cloud, you can migrate your projects to the Databricks-specific adapter from the generic Apache Spark adapter.
- Your project must be compatible with dbt 1.0 or greater. Refer to Upgrading to v1.0 for details. For the latest version of dbt, refer to Upgrading to v1.3.
- For dbt Cloud, you need administrative (admin) privileges to migrate dbt projects.
- dbt Cloud
- dbt Core
The migration to the
dbt-databricks adapter from
dbt-spark shouldn't cause any downtime for production jobs. dbt Labs recommends that you schedule the connection change when usage of the IDE is light to avoid disrupting your team.
To update your Databricks connection in dbt Cloud:
- Select Account Settings in the main navigation bar.
- On the Projects tab, find the project you want to migrate to the dbt-databricks adapter.
- Click the hyperlinked Connection for the project.
- Click Edit in the top right corner.
- Select Databricks for the warehouse
- Select Databricks (dbt-databricks) for the adapter and enter the:
- (optional) catalog name
- Click Save.
Everyone in your organization who uses dbt Cloud must refresh the IDE before starting work again. It should refresh in less than a minute.
About your credentials
When you update the Databricks connection in dbt Cloud, your team will not lose their credentials. This makes migrating easier since it only requires you to delete the Databricks connection and re-add the cluster or endpoint information.
These credentials will not get lost when there's a successful connection to Databricks using the
dbt-spark ODBC method:
- The credentials you supplied to dbt Cloud to connect to your Databricks workspace.
- The personal access tokens your team added in their dbt Cloud profile so they can develop in the IDE for a given project.
- The access token you added for each deployment environment so dbt Cloud can connect to Databricks during production jobs.
To migrate your dbt Core projects to the
dbt-databricks adapter from
- Install the dbt-databricks adapter in your environment
- Update your Databricks connection by modifying your
Anyone who's using your project must also make these changes in their environment.
You can use the following examples of the
profiles.yml file to see the authentication setup with
dbt-spark compared to the simpler setup with
dbt-databricks when connecting to an SQL endpoint. A cluster example would look similar.
An example of what authentication looks like with
An example of how much simpler authentication is with