Skip to main content

Apache Impala configurations

Configuring tables

When materializing a model as table, you may include several optional configs that are specific to the dbt-impala plugin, in addition to the standard model configs.

OptionDescriptionRequired?Example
partition_bypartition by a column, typically a directory per partition is createdNopartition_by=['name']
sort_bysort by a columnNosort_by=['age']
row_formatformat to be used when storing individual arowsNorow_format='delimited'
stored_asunderlying storage format of the tableNostored_as='PARQUET'
locationstorage location, typically an hdfs pathNoLOCATION='/user/etl/destination'
commentcomment for the tableNocomment='this is the cleanest model'
serde_propertiesSerDes ([de-]serialization) prperties of tableNoserde_properties="('quoteChar'=''', 'escapeChar'='\')"
tbl_propertiesany metadata can be stored as key/value pair with the tableNotbl_properties="('dbt_test'='1')"
is_cachedtrue or false - if this table is cachedNois_cached=false (default)
cache_poolcache pool name to use if is_cached is set to trueNo
replication_factorcache replication factor to use if is_cached is set to trueNo
externalis this an external table - true / falseNoexternal=true

For Cloudera specific options for above parameters see documentation of CREATE TABLE (https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/impala_create_table.html)

Incremental models

Supported modes for incremental model:

  • append (default): Insert new records without updating or overwriting any existing data.
  • insert_overwrite: For new records, insert data. When used along with partition clause, update data for changed record and insert data for new records.

Unsupported modes:

  • unique_key This is not suppored option for incremental models in dbt-impala
  • merge: Merge is not supported by the underlying warehouse, and hence not supported by dbt-impala

Example: Using partition_by config option

impala_partition_by.sql
{{
config(
materialized='table',
unique_key='id',
partition_by=['city'],
)
}}

with source_data as (
select 1 as id, "Name 1" as name, "City 1" as city,
union all
select 2 as id, "Name 2" as name, "City 2" as city,
union all
select 3 as id, "Name 3" as name, "City 2" as city,
union all
select 4 as id, "Name 4" as name, "City 1" as city,
)

select * from source_data

In the above example, a sample table is created with partition_by and other config options. One thing to note when using partition_by option is that the select query should always have the column name used in partition_by option as the last one, as can be seen for the city column name used in the above query. If the partition_by clause is not the same as the last column in select statement, Impala will flag an error when trying to create the model.

Was this page helpful?

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

0