Documentation

Related documentation

Assumed knowledge

Overview

Good documentation for your dbt models will help downstream consumers discover and understand the datasets which you curate for them.

dbt provides a way to generate documentation for your dbt project and render it as a website. The documentation for your project includes:

  • Information about your project: including model code, a DAG of your project, any tests you've added to a column, and more.
  • Information about your data warehouse: including column data types, and table sizes. This information is generated by running queries against the information schema.

Here's a screenshot of an example docs site (you can find the whole site here):

Auto-generated dbt documentation website

Auto-generated dbt documentation website

Importantly, dbt also provides a way to add descriptions to models, columns, sources, and more, to further enhance your documentation.

Creating documentation for the first time

If you're new to dbt, we recommend that you check out our Getting Started Tutorial to build your first dbt project, complete with documentation.

Adding descriptions to your project

To add descriptions to your project, use the description: key in the same files where you declare tests, like so:

models/<filename>.yml
version: 2
models:
- name: events
description: This table contains clickstream events from the marketing website
columns:
- name: event_id
description: This is a unique identifier for the event
tests:
- unique
- not_null
- name: user-id
quote: true
description: The user who performed the event
tests:
- not_null

Generating project documentation

You can generate a documentation site for your project (with or without descriptions) using the CLI.

First, run dbt docs generate — this command tells dbt to compile relevant information about your dbt project and warehouse into manifest.json and catalog.json files respectively. To see documentation for all columns and not just columns described in your project, ensure that you have created the models with dbt run beforehand.

Then, run dbt docs serve to use these .json files to populate a local website.

Note: We're adding a way to see docs in the dbt Cloud IDE soon! You can generate docs as part of your scheduled runs in the IDE — see this guide.

FAQs

 Are there any example dbt documentation sites?
 Do I need to add a yaml entry for column for it to appear in the docs site?
 How do I write long-form explanations in my descriptions?
 How do I share my documentation with my team members?
 Can I document things other than models, like sources, seeds, and snapshots?

Using Docs Blocks

Syntax

To declare a docs block, use the jinja docs tag. Docs blocks must be uniquely named, and can contain arbitrary markdown. In practice, a docs block might look like this:

events.md
{% docs table_events %}
This table contains clickstream events from the marketing website.
The events in this table are recorded by [Snowplow](http://github.com/snowplow/snowplow) and piped into the warehouse on an hourly basis. The following pages of the marketing site are tracked:
- /
- /about
- /team
- /contact-us
{% enddocs %}

In the above example, a docs block named table_events is defined with some descriptive markdown contents. There is nothing significant about the name table_events — docs blocks can be named however you like, as long as the name only contains alphanumeric and underscore characters.

Placement

Docs blocks should be placed in files with a .md file extension. By default, dbt will search in all resource paths for docs blocks (i.e. the combined list of source-paths, data-paths, analysis-paths, macro-paths and snapshot-paths) — you can adjust this behavior using the docs-paths config.

Usage

To use a docs block, reference it from your schema.yml file with the doc() function. Using the examples above, the table_events docs can be included in the schema.yml file as shown below:

schema.yml
version: 2
models:
- name: events
description: '{{ doc("table_events") }}'
columns:
- name: event_id
description: This is a unique identifier for the event
tests:
- unique
- not_null

In the resulting documentation, '{{ doc("table_events") }}' will be expanded to the markdown defined in the table_events docs block.

Setting a custom overview

The "overview" shown in the documentation website can be overridden by supplying your own docs block called __overview__. By default, dbt supplies an overview with helpful information about the docs site itself. Depending on your needs, it may be a good idea to override this docs block with specific information about your company style guide, links to reports, or information about who to contact for help. To override the default overview, create a docs block that looks like this:

models/overview.md
{% docs __overview__ %}
# Monthly Recurring Revenue (MRR) playbook.
This dbt project is a worked example to demonstrate how to model subscription
revenue. **Check out the full write-up [here](https://blog.getdbt.com/modeling-subscription-revenue/),
as well as the repo for this project [here](https://github.com/fishtown-analytics/mrr-playbook/).**
...
{% enddocs %}

Custom project-level overviews

Changelog

You can set different overviews for each dbt project/package included in your documentation site by creating a docs block named __[project_name]__. For example, in order to define custom overview pages that appear when a viewer navigates inside the dbt_utils or snowplow package:

models/overview.md
{% docs __dbt_utils__ %}
# Utility macros
Our dbt project heavily uses this suite of utility macros, especially:
- `surrogate_key`
- `test_equality`
- `pivot`
{% enddocs %}
{% docs __snowplow__ %}
# Snowplow sessionization
Our organization uses this package of transformations to roll Snowplow events
up to page views and sessions.
{% enddocs %}

Navigating the documentation site

Using the docs interface, you can navigate to the documentation for a specific model. That might look something like this:

Auto-generated documentation for a dbt model

Auto-generated documentation for a dbt model

Here, you can see the a representation of the project structure, a markdown description for a model, and a list of all of the columns (with documentation) in the model.

From a docs page, you can click the green button in the bottom-right corner of the webpage to expand a "mini-map" of your DAG. This pane (shown below) will display the immediate parents and children of the model that you're exploring.

Opening the DAG mini-map

Opening the DAG mini-map

In this example, the fct_subscription_transactions model only has one direct parent. By clicking the "Expand" button in the top-right corner of the window, we can pivot the graph horizontally and view the full lineage for our model. This lineage is filterable using the --models and --exclude flags, which are consistent with the semantics of model selection syntax. Further, you can right-click to interact with the DAG, jump to documentation, or share links to your graph visualization with your coworkers.

The full lineage for a dbt model

The full lineage for a dbt model

Deploying the documentation site

Security

The dbt docs serve command is only intended for local/development hosting of the documentation site. Please use one of the methods listed below (or similar) to ensure that your documentation site is hosted securely!

dbt's documentation website was built in a way that makes it easy to host on the web. The site itself is "static", meaning that you don't need any type of "dynamic" server to serve the docs. Some common methods for hosting the docs are:

  1. dbt Cloud
  2. Host on S3 (optionally with IP access restrictions)
  3. Publish on Netlify
  4. Spin up a web server like Apache/Nginx