Connect Apache Spark
dbt-databricks
If you're using Databricks, the dbt-databricks
adapter is recommended over dbt-spark
.
If you're still using dbt-spark with Databricks consider migrating from the dbt-spark adapter to the dbt-databricks adapter.
See Connect Databricks for the Databricks version of this page.
dbt Cloud supports connecting to an Apache Spark cluster using the HTTP method or the Thrift method. Note: While the HTTP method can be used to connect to an all-purpose Databricks cluster, the ODBC method is recommended for all Databricks connections. For further details on configuring these connection parameters, please see the dbt-spark documentation.
To learn how to optimize performance with data platform-specific configurations in dbt Cloud, refer to Apache Spark-specific configuration.
The following fields are available when creating an Apache Spark connection using the HTTP and Thrift connection methods:
Field | Description | Examples |
---|---|---|
Host Name | The hostname of the Spark cluster to connect to | yourorg.sparkhost.com |
Port | The port to connect to Spark on | 443 |
Organization | Optional (default: 0) | 0123456789 |
Cluster | The ID of the cluster to connect to | 1234-567890-abc12345 |
Connection Timeout | Number of seconds after which to timeout a connection | 10 |
Connection Retries | Number of times to attempt connecting to cluster before failing | 10 |
User | Optional | dbt_cloud_user |
Auth | Optional, supply if using Kerberos | KERBEROS |
Kerberos Service Name | Optional, supply if using Kerberos | hive |
