Skip to main content

Connect Apache Spark

If you're using Databricks, use dbt-databricks

If you're using Databricks, the dbt-databricks adapter is recommended over dbt-spark. If you're still using dbt-spark with Databricks consider migrating from the dbt-spark adapter to the dbt-databricks adapter.


See Connect Databricks for the Databricks version of this page.

dbt Cloud supports connecting to an Apache Spark cluster using the HTTP method or the Thrift method. Note: While the HTTP method can be used to connect to an all-purpose Databricks cluster, the ODBC method is recommended for all Databricks connections. For further details on configuring these connection parameters, please see the dbt-spark documentation.

To learn how to optimize performance with data platform-specific configurations in dbt Cloud, refer to Apache Spark-specific configuration.

The following fields are available when creating an Apache Spark connection using the HTTP and Thrift connection methods:

Host NameThe hostname of the Spark cluster to connect
PortThe port to connect to Spark on443
OrganizationOptional (default: 0)0123456789
ClusterThe ID of the cluster to connect to1234-567890-abc12345
Connection TimeoutNumber of seconds after which to timeout a connection10
Connection RetriesNumber of times to attempt connecting to cluster before failing10
AuthOptional, supply if using KerberosKERBEROS
Kerberos Service NameOptional, supply if using Kerberoshive
Configuring a Spark connectionConfiguring a Spark connection