Skip to main content

Apache Hive setup

Overview of dbt-hive

  • Maintained by: Cloudera
  • Authors: Cloudera
  • GitHub repo: cloudera/dbt-hive
  • PyPI package: dbt-hive
  • Slack channel: #db-hive
  • Supported dbt Core version: v1.1.0 and newer
  • dbt Cloud support: Not Supported
  • Minimum data platform version: n/a

Installing dbt-hive

pip is the easiest way to install the adapter:

pip install dbt-hive

Installing dbt-hive will also install dbt-core and any other dependencies.

Configuring dbt-hive

For Hive-specifc configuration please refer to Hive Configuration

For further info, refer to the GitHub repository: cloudera/dbt-hive

Connection Methods

dbt-hive can connect to Apache Hive and Cloudera Data Platform clusters. The Impyla library is used to establish connections to Hive.

dbt-hive supports two transport mechanisms:

  • binary
  • HTTP(S)

The default mechanism is binary. To use HTTP transport, use the boolean option use_http_transport: [true / false].

Authentication Methods

dbt-hive supports two authentication mechanisms:

  • insecure No authentication is used, only recommended for testing.
  • ldap Authentication via LDAP


This method is only recommended if you have a local install of Hive and want to test out the dbt-hive adapter.

target: dev
type: hive
host: localhost
port: [port] # default value: 10000
schema: [schema name]


LDAP allows you to authenticate with a username and password when Hive is configured with LDAP Auth. LDAP is supported over Binary & HTTP connection mechanisms.

This is the recommended authentication mechanism to use with Cloudera Data Platform (CDP).

target: dev
type: hive
host: [host name]
http_path: [optional, http path to Hive] # default value: None
port: [port] # default value: 10000
auth_type: ldap
use_http_transport: [true / false] # default value: true
use_ssl: [true / false] # TLS should always be used with LDAP to ensure secure transmission of credentials, default value: true
username: [username]
password: [password]
schema: [schema name]

Note: When creating workload user in CDP, make sure the user has CREATE, SELECT, ALTER, INSERT, UPDATE, DROP, INDEX, READ and WRITE permissions. If you need the user to execute GRANT statements, you should also configure the appropriate GRANT permissions for them. When using Apache Ranger, permissions for allowing GRANT are typically set using "Delegate Admin" option. For more information, see grants and on-run-start & on-run-end.


The Kerberos authentication mechanism uses GSSAPI to share Kerberos credentials when Hive is configured with Kerberos Auth.

target: dev
type: hive
host: [hostname]
port: [port] # default value: 10000
auth_type: [GSSAPI]
kerberos_service_name: [kerberos service name] # default value: None
use_http_transport: true # default value: true
use_ssl: true # TLS should always be used to ensure secure transmission of credentials, default value: true
schema: [schema name]

Note: A typical setup of Cloudera Private Cloud will involve the following steps to setup Kerberos before one can execute dbt commands:

  • Get the correct realm config file for your installation (krb5.conf)
  • Set environment variable to point to the config file (export KRB5_CONFIG=/path/to/krb5.conf)
  • Set correct permissions for config file (sudo chmod 644 /path/to/krb5.conf)
  • Obtain keytab using kinit (kinit username@YOUR_REALM.YOUR_DOMAIN)
  • The keytab is valid for certain period after which you will need to run kinit again to renew validity of the keytab.
  • User will need CREATE, DROP, INSERT permissions on the schema provided in profiles.yml


By default, the adapter will collect instrumentation events to help improve functionality and understand bugs. If you want to specifically switch this off, for instance, in a production environment, you can explicitly set the flag usage_tracking: false in your profiles.yml file.

Installation and Distribution

dbt's adapter for Apache Hive is managed in its own repository, dbt-hive. To use it, you must install the dbt-hive plugin.

Using pip

The following commands will install the latest version of dbt-hive as well as the requisite version of dbt-core and impyla driver used for connections.

pip install dbt-hive

Supported Functionality

Materialization: TableYes
Materialization: ViewYes
Materialization: Incremental - AppendYes
Materialization: Incremental - Insert+OverwriteYes
Materialization: Incremental - MergeNo
Materialization: EphemeralNo
Authentication: LDAPYes
Authentication: KerberosYes