Apache Impala configurations
Configuring tables
When materializing a model as table
, you may include several optional configs that are specific to the dbt-impala plugin, in addition to the standard model configs.
Option | Description | Required? | Example |
---|---|---|---|
partition_by | partition by a column, typically a directory per partition is created | No | partition_by=['name'] |
sort_by | sort by a column | No | sort_by=['age'] |
row_format | format to be used when storing individual arows | No | row_format='delimited' |
stored_as | underlying storage format of the table | No | stored_as='PARQUET' |
location | storage location, typically an hdfs path | No | LOCATION='/user/etl/destination' |
comment | comment for the table | No | comment='this is the cleanest model' |
serde_properties | SerDes ([de-]serialization) prperties of table | No | serde_properties="('quoteChar'=''', 'escapeChar'='\')" |
tbl_properties | any metadata can be stored as key/value pair with the table | No | tbl_properties="('dbt_test'='1')" |
is_cached | true or false - if this table is cached | No | is_cached=false (default) |
cache_pool | cache pool name to use if is_cached is set to true | No | |
replication_factor | cache replication factor to use if is_cached is set to true | No | |
external | is this an external table - true / false | No | external=true |
For Cloudera specific options for above parameters see documentation of CREATE TABLE (https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/impala_create_table.html)
Incremental models
Supported modes for incremental model:
append
(default): Insert new records without updating or overwriting any existing data.insert_overwrite
: For new records, insert data. When used along with partition clause, update data for changed record and insert data for new records.
Unsupported modes:
unique_key
This is not suppored option for incremental models in dbt-impalamerge
: Merge is not supported by the underlying warehouse, and hence not supported by dbt-impala
Example: Using partition_by config option
{{
config(
materialized='table',
unique_key='id',
partition_by=['city'],
)
}}
with source_data as (
select 1 as id, "Name 1" as name, "City 1" as city,
union all
select 2 as id, "Name 2" as name, "City 2" as city,
union all
select 3 as id, "Name 3" as name, "City 2" as city,
union all
select 4 as id, "Name 4" as name, "City 1" as city,
)
select * from source_data
In the above example, a sample table is created with partition_by and other config options. One thing to note when using partition_by option is that the select query should always have the column name used in partition_by option as the last one, as can be seen for the city
column name used in the above query. If the partition_by clause is not the same as the last column in select statement, Impala will flag an error when trying to create the model.
Was this page helpful?
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.