Skip to main content

Apache Impala configurations

Configuring tables

When materializing a model as table, you may include several optional configs that are specific to the dbt-impala plugin, in addition to the standard model configs.

OptionDescriptionRequired?Example
partition_bypartition by a column, typically a directory per partition is createdNopartition_by=['name']
sort_bysort by a columnNosort_by=['age']
row_formatformat to be used when storing individual arowsNorow_format='delimited'
stored_asunderlying storage format of the tableNostored_as='PARQUET'
locationstorage location, typically an hdfs pathNoLOCATION='/user/etl/destination'
commentcomment for the tableNocomment='this is the cleanest model'
serde_propertiesSerDes ([de-]serialization) prperties of tableNoserde_properties="('quoteChar'='\'', 'escapeChar'='\')"
tbl_propertiesany metadata can be stored as key/value pair with the tableNotbl_properties="('dbt_test'='1')"
is_cachedtrue or false - if this table is cachedNois_cached=false (default)
cache_poolcache pool name to use if is_cached is set to trueNo
replication_factorcache replication factor to use if is_cached is set to trueNo
externalis this an external table - true / falseNoexternal=true

For Cloudera specific options for above parameters see documentation of CREATE TABLE (https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/impala_create_table.html)

Incremental models

Supported modes for incremental model:

  • append (default): Insert new records without updating or overwriting any existing data.
  • insert_overwrite: For new records, insert data. When used along with partition clause, update data for changed record and insert data for new records.

Unsupported modes:

  • unique_key This is not suppored option for incremental models in dbt-impala
  • merge: Merge is not supported by the underlying warehouse, and hence not supported by dbt-impala

Example: Using partition_by config option

impala_partition_by.sql
{{
config(
materialized='table',
unique_key='id',
partition_by=['city'],
)
}}

with source_data as (
select 1 as id, "Name 1" as name, "City 1" as city,
union all
select 2 as id, "Name 2" as name, "City 2" as city,
union all
select 3 as id, "Name 3" as name, "City 2" as city,
union all
select 4 as id, "Name 4" as name, "City 1" as city,
)

select * from source_data

In the above example, a sample table is created with partition_by and other config options. One thing to note when using partition_by option is that the select query should always have the column name used in partition_by option as the last one, as can be seen for the city column name used in the above query. If the partition_by clause is not the same as the last column in select statement, Impala will flag an error when trying to create the model.