Why define a contract?
Defining a dbt model is as easy as writing a SQL
select statement. Your query naturally produces a dataset with columns of names and types based on the columns you select and the transformations you apply.
While this is ideal for quick and iterative development, for some models, constantly changing the shape of its returned dataset poses a risk when other people and processes are querying that model. It's better to define a set of upfront "guarantees" that define the shape of your model. We call this set of guarantees a "contract." While building your model, dbt will verify that your model's transformation will produce a dataset matching up with its contract, or it will fail to build.
Where are contracts supported?
At present, model contracts are supported for:
- SQL models. Contracts are not yet supported for Python models.
- Models materialized as
on_schema_change: append_new_columns). Views offer limited support for column names and data types, but not
constraints. Contracts are not supported for
- Certain data platforms, but the supported and enforced
constraintsvary by platform.
How to define a contract
Let's say you have a model with a query like:
-- lots of SQL
final as (
-- ... many more ...
select * from final
To enforce a model's contract, set
enforced: true under the
When enforced, your contract must include every column's
data_type matches one that your data platform understands).
If your model is materialized as
incremental, and depending on your data platform, you may optionally specify additional constraints, such as
not_null (containing zero null values).
- name: dim_customers
- name: customer_id
- type: not_null
- name: customer_name
When building a model with a defined contract, dbt will do two things differently:
- dbt will run a "preflight" check to ensure that the model's query will return a set of columns with names and data types matching the ones you have defined. This check is agnostic to the order of columns specified in your model (SQL) or YAML spec.
- dbt will include the column names, data types, and constraints in the DDL statements it submits to the data platform, which will be enforced while building or updating the model's table.
Which models should have contracts?
Any model meeting the criteria described above can define a contract. We recommend defining contracts for "public" models that are being relied on downstream.
- Inside of dbt: Shared with other groups, other teams, and (in the future) other dbt projects.
- Outside of dbt: Reports, dashboards, or other systems & processes that expect this model to have a predictable structure. You might reflect these downstream uses with exposures.
How are contracts different from tests?
A model's contract defines the shape of the returned dataset. If the model's logic or input data doesn't conform to that shape, the model does not build.
Tests are a more flexible mechanism for validating the content of your model after it's built. So long as you can write the query, you can run the test. Tests are more configurable, such as with custom severity thresholds. They are easier to debug after finding failures, because you can query the already-built model, or store the failing records in the data warehouse.
In some cases, you can replace a test with its equivalent constraint. This has the advantage of guaranteeing the validation at build time, and it probably requires less compute (cost) in your data platform. The prerequisites for replacing a test with a constraint are:
- Making sure that your data platform can support and enforce the constraint that you need. Most platforms only enforce
- Materializing your model as
- Defining a full contract for this model by specifying the
data_typeof each column.
Why aren't tests part of the contract? In a parallel for software APIs, the structure of the API response is the contract. Quality and reliability ("uptime") are also very important attributes of an API's quality, but they are not part of the contract per se. When the contract changes in a backwards-incompatible way, it is a breaking change that requires a bump in major version.
Can I define a "partial" contract?
Currently, dbt contracts apply to all columns defined in a model, and they require declaring explicit expectations about all of those columns. The explicit declaration of a contract is not an accident—it's very much the intent of this feature.
We are investigating the feasibility of supporting "inferred" or "partial" contracts in the future. This would enable you to define constraints and strict data typing for a subset of columns, while still detecting breaking changes on other columns by comparing against the same model in production. If you're interested, please upvote or comment on dbt-core#7432.