Commonly, analysts need to "look back in time" at some previous state of data in their mutable tables. While some systems are built in a way that makes accessing historical data possible, this is often not the case. dbt provides a mechanism,
dbt archive, to record changes to a mutable table over time.
dbt archive, declare the tables you want to archive in your
... archive source_schemaproduction_data # schema to look for tables in (declared below) target_schemadbt_archive # where to archive the data to tables# list of tables to archive source_tableusers # table to archive (production_data.users) target_tableusers_archived # table to insert archived data into (dbt_archive.users_archived) updated_atupdated_at # used to determine when data has changed unique_keyid # used to generate archival query source_tablesome_other_table # production_data.some_other_table target_tablesome_other_table_archive # dbt_archive.some_other_table_archive updated_at"updatedAt" unique_key"expressions || work || LOWER(too)" source_schemasome_other_schema ...
The archived tables will mirror the schema of the source tables they are generated from. In addition, four fields are added to the archive table:
valid_from: The timestamp when this archived row was inserted (and first considered valid)
valid_to: The timestamp when this archived row became invalidated. The first archived record for a given
valid_to = NULL. When newer data is archived for that
valid_tofield of the old record is set to the
valid_fromfield of the new record.
scd_id: A unique key generated for each archive record. Scd = Slowly Changing Dimension.
dbt_updated_at: The update timestamp of the source record when the archived row was inserted.
dbt models can be built on top of these archived tables. The most recent record for a given
unique_key is the one where
To run this archive process, use the command
dbt archive. After testing and confirming that the archival works, you should schedule this process to run on a recurring basis.