Project state in dbt

dbt provides a stateful way of deploying dbt. Artifacts are accessible programmatically via the Discovery API in the metadata platform.

With the implementation of the environment endpoint in the Discovery API, we've introduced the idea of multiple states. The Discovery API provides a single API endpoint that returns the latest state of models, sources, and other nodes in the DAG.

A single deployment environment should represent the production state of a given dbt project.

There are two states that can be queried in dbt:

Applied state refers to what exists in the data warehouse after a successful dbt run. The model build succeeds and now exists as a table in the warehouse.
Definition state depends on what exists in the project given the code defined in it (for example, manifest state), which hasn’t necessarily been executed in the data platform (maybe just the result of dbt compile).

Definition (logical) vs. applied state of dbt nodes

In a dbt project, the state of a node definition represents the configuration, transformations, and dependencies defined in the SQL and YAML files. It captures how the node should be processed in relation to other nodes and tables in the data warehouse and may be produced by a dbt build, run, parse, or compile. It changes whenever the project code changes.

A node’s applied state refers to the node’s actual state after it has been successfully executed in the DAG; for example, models are executed; thus, their state is applied to the data warehouse via dbt run or dbt build. It changes whenever a node is executed. This state represents the result of the transformations and the actual data stored in the database, which for models can be a table or a view based on the defined logic.

The applied state includes execution info, which contains metadata about how the node arrived in the applied state: the most recent execution (successful or attempted), such as when it began, its status, and how long it took.

Here’s how you’d query and compare the definition vs. applied state of a model using the Discovery API:

query Compare($environmentId: Int!, $first: Int!) {
	environment(id: $environmentId) {
		definition {
			models(first: $first) {
				edges {
					node {
						name
						rawCode
					}
				}
			}
		}
		applied {
			models(first: $first) {
				edges {
					node {
						name
						rawCode 
						executionInfo {
							executeCompletedAt
						}
					}
				}
			}
		}
	}
}

Most Discovery API use cases will favor the applied state since it pertains to what has actually been run and can be analyzed.

Affected states by node type

The following table shows the states of dbt nodes and how they are affected by the Discovery API.

Node	Executed in DAG	Created by execution	Exists in database	Lineage	States
Analysis	No	No	No	Upstream	Definition
Data test	Yes	Yes	No	Upstream	Applied & definition
Exposure	No	No	No	Upstream	Definition
Group	No	No	No	Downstream	Definition
Macro	Yes	No	No	N/A	Definition
Metric	No	No	No	Upstream & downstream	Definition
Model	Yes	Yes	Yes	Upstream & downstream	Applied & definition
Saved queries (not in API)	N/A	N/A	N/A	N/A	N/A
Seed	Yes	Yes	Yes	Downstream	Applied & definition
Semantic model	No	No	No	Upstream & downstream	Definition
Snapshot	Yes	Yes	Yes	Upstream & downstream	Applied & definition
Source	Yes	No	Yes	Downstream	Applied & definition
Unit tests	Yes	Yes	No	Downstream	Definition

Loading table...

Caveats about state/metadata updates

Over time, Cloud Artifacts will provide information to maintain state for features/services in dbt and enable you to access state in dbt and its downstream ecosystem. Cloud Artifacts is currently focused on the latest production state, but this focus will evolve.

Here are some limitations of the state representation in the Discovery API:

Users must access the default production environment to know the latest state of a project.
The API gets the definition from the latest manifest generated in a given deployment environment, but that often won’t reflect the latest project code state.
Compiled code results may be outdated depending on dbt run step order and failures.
Catalog info can be outdated, or incomplete (in the applied state), based on if/when docs generate was last run.
Source freshness checks can be out of date (in the applied state) depending on when the command was last run, and it’s not included in build.

Was this page helpful?

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Project state in dbt

Definition (logical) vs. applied state of dbt nodes

Affected states by node type

Caveats about state/metadata updates

Was this page helpful?

Start building with dbt.

Resources

Community

Support

Connect with Us

Definition (logical) vs. applied state of dbt nodes​

Affected states by node type​

Caveats about state/metadata updates​

Was this page helpful?

Resources

Community

Support

Connect with Us

Definition (logical) vs. applied state of dbt nodes

Affected states by node type

Caveats about state/metadata updates