Skip to main content

Apache Impala setup

  • Maintained by: Cloudera
  • Authors: Cloudera
  • GitHub repo: cloudera/dbt-impala
  • PyPI package: dbt-impala
  • Slack channel: #db-impala
  • Supported dbt Core version: v1.1.0 and newer
  • dbt Cloud support: Not Supported
  • Minimum data platform version: n/a

Installing dbt-impala

Use pip to install the adapter. Before 1.8, installing the adapter would automatically install dbt-core and any additional dependencies. Beginning in 1.8, installing an adapter does not automatically install dbt-core. This is because adapters and dbt Core versions have been decoupled from each other so we no longer want to overwrite existing dbt-core installations. Use the following command for installation:

Configuring dbt-impala

For Impala-specific configuration, please refer to Impala configs.

Connection Methods

dbt-impala can connect to Apache Impala and Cloudera Data Platform clusters.

The Impyla library is used to establish connections to Impala.

Two transport mechanisms are supported:

  • binary
  • HTTP(S)

The default mechanism is binary. To use HTTP transport, use the boolean option use_http_transport: [true / false].

Authentication Methods

dbt-impala supports three authentication mechanisms:

  • insecure No authentication is used, only recommended for testing.
  • ldap Authentication via LDAP
  • kerbros Authentication via Kerberos (GSSAPI)

Insecure

This method is only recommended if you have a local install of Impala and want to test out the dbt-impala adapter.

~/.dbt/profiles.yml
your_profile_name:
target: dev
outputs:
dev:
type: impala
host: [host] # default value: localhost
port: [port] # default value: 21050
dbname: [db name] # this should be same as schema name provided below, starting with 1.1.2 this parameter is optional
schema: [schema name]

LDAP

LDAP allows you to authenticate with a username & password when Impala is configured with LDAP Auth. LDAP is supported over Binary & HTTP connection mechanisms.

This is the recommended authentication mechanism to use with Cloudera Data Platform (CDP).

~/.dbt/profiles.yml
your_profile_name:
target: dev
outputs:
dev:
type: impala
host: [host name]
http_path: [optional, http path to Impala]
port: [port] # default value: 21050
auth_type: ldap
use_http_transport: [true / false] # default value: true
use_ssl: [true / false] # TLS should always be used with LDAP to ensure secure transmission of credentials, default value: true
username: [username]
password: [password]
dbname: [db name] # this should be same as schema name provided below, starting with 1.1.2 this parameter is optional
schema: [schema name]
retries: [retries] # number of times impyla attempts retry conneciton to warehouse, default value: 3

Note: When creating workload user in CDP ensure that the user has CREATE, SELECT, ALTER, INSERT, UPDATE, DROP, INDEX, READ and WRITE permissions. If the user is required to execute GRANT statements, see for instance (https://docs.getdbt.com/reference/resource-configs/grants) or (https://docs.getdbt.com/reference/project-configs/on-run-start-on-run-end) appropriate GRANT permissions should be configured. When using Apache Ranger, permissions for allowing GRANT are typically set using "Delegate Admin" option.

Kerberos

The Kerberos authentication mechanism uses GSSAPI to share Kerberos credentials when Impala is configured with Kerberos Auth.

~/.dbt/profiles.yml
your_profile_name:
target: dev
outputs:
dev:
type: impala
host: [hostname]
port: [port] # default value: 21050
auth_type: [GSSAPI]
kerberos_service_name: [kerberos service name] # default value: None
use_http_transport: true # default value: true
use_ssl: true # TLS should always be used with LDAP to ensure secure transmission of credentials, default value: true
dbname: [db name] # this should be same as schema name provided below, starting with 1.1.2 this parameter is optional
schema: [schema name]
retries: [retries] # number of times impyla attempts retry conneciton to warehouse, default value: 3

Note: A typical setup of Cloudera EDH will involve the following steps to setup Kerberos before one can execute dbt commands:

  • Get the correct realm config file for your installation (krb5.conf)
  • Set environment variable to point to the config file (export KRB5_CONFIG=/path/to/krb5.conf)
  • Set correct permissions for config file (sudo chmod 644 /path/to/krb5.conf)
  • Obtain keytab using kinit (kinit username@YOUR_REALM.YOUR_DOMAIN)
  • The keytab is valid for certain period after which you will need to run kinit again to renew validity of the keytab.

Instrumentation

By default, the adapter will send instrumentation events to Cloudera to help improve functionality and understand bugs. If you want to specifically switch this off, for instance, in a production environment, you can explicitly set the flag usage_tracking: false in your profiles.yml file.

Relatedly, if you'd like to turn off dbt Lab's anonymous usage tracking, see YAML Configurations: Send anonymous usage stats for more info

Supported Functionality

NameSupported
Materialization: TableYes
Materialization: ViewYes
Materialization: Incremental - AppendYes
Materialization: Incremental - Insert+OverwriteYes
Materialization: Incremental - MergeNo
Materialization: EphemeralNo
SeedsYes
TestsYes
SnapshotsYes
DocumentationYes
Authentication: LDAPYes
Authentication: KerberosYes
0