List of FAQs

#1. What other model configurations are there?

You can also configure:

  • tags to support easy categorization and graph selection
  • custom schemas to split your models across multiple schemas
  • aliases if your view/table name should differ from the filename
  • Snippets of SQL to run at the start or end of a model, known as hooks
  • Warehouse-specific configurations for performance (e.g. sort and dist keys on Redshift, partitions on BigQuery)

Check out the docs on models to learn more.

#2. What materializations are available in dbt?

dbt ships with four materializations: view, table, incremental and ephemeral. Check out the documentation on materializations for more information on each of these options.

You can also create your own custom materializations, if required however this is an advanced feature of dbt.

#3. What tests are available for me to use in dbt?

Out of the box, dbt ships with the following tests:

  • unique
  • not_null
  • accepted_values
  • relationships (i.e. referential integrity)

You can also write your own custom schema tests.

Some additional custom schema tests have been open-sourced in the dbt-utils package, check out the docs on packages to learn how to make these tests available in your project.

#4. How do I build one seed at a time?

As of v0.16.0, you can use a --select option with the dbt seed command, like so:

$ dbt seed --select country_codes

There is also an --exclude option.

Check out more in the model selection syntax documentation.

Prior to v0.16.0, there was no way to build one seed at a time.

#5. How can I see the SQL that dbt is running?

To check out the SQL that dbt is running, you can look in:

  • dbt Cloud:
    • Within the run output, click on a model name, and then select "Details"
  • dbt CLI:
    • The target/compiled/ directory for compiled select statements
    • The target/run/ directory for compiled create statements
    • The logs/dbt.log file for verbose logging.

#6. Can I store my seeds in a directory other than the `data` directory in my project?

Be default, dbt expects your seed files to be located in the data subdirectory of your project.

To change this, update the data-paths configuration in your dbt_project.yml file, like so:

dbt_project.yml
data-paths: ["seeds"]

#7. Do I need to create my target schema before running dbt?

Nope! dbt will check if the schema exists when it runs. If the schema does not exist, dbt will create it for you.

#8. How do I create dependencies between models?

When you use the ref function, dbt automatically infers the dependencies between models.

For example, consider a model, customer_orders, like so:

models/customer_orders.sql
select
customer_id,
min(order_date) as first_order_date,
max(order_date) as most_recent_order_date,
count(order_id) as number_of_orders
from {{ ref('stg_orders') }}
group by 1

There's no need to explicitly define these dependencies. dbt will understand that the stg_orders model needs to be built before the above model (customer_orders). When you execute dbt run, you will see these being built in order:

$ dbt run
Running with dbt=0.16.0
Found 2 models, 28 tests, 0 snapshots, 0 analyses, 130 macros, 0 operations, 0 seed files, 3 sources
11:42:52 | Concurrency: 8 threads (target='dev_snowflake')
11:42:52 |
11:42:52 | 1 of 2 START view model dbt_claire.stg_jaffle_shop__orders........... [RUN]
11:42:55 | 1 of 2 OK creating view model dbt_claire.stg_jaffle_shop__orders..... [CREATE VIEW in 2.50s]
11:42:55 | 2 of 2 START relation dbt_claire.customer_orders..................... [RUN]
11:42:56 | 2 of 2 OK creating view model dbt_claire.customer_orders............. [CREATE VIEW in 0.60s]
11:42:56 | Finished running 2 view models in 15.13s.
Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2

To get some practice with this, we recommend you complete the tutorial to build your first dbt project

#9. What privileges does my database user need to use dbt?

Your user will need to be able to:

  • select from raw data in your warehouse (i.e. data to be transformed)
  • create schemas, and therefore create tables/views within that schema¹
  • read system views to generate documentation (i.e. views in information_schema)

On Postgres, Redshift, and Snowflake, use a series of grants to ensure that your user has the correct privileges.

On BigQuery, use the "BigQuery User" role to assign these privileges.


¹Alternatively, a separate user can create a schema for the dbt user, and then grant the user privileges to create within this schema. We generally recommend granting your dbt user the ability to create schemas, as it is less complicated to implement.

#10. What happens if one of my runs fails?

If you're using dbt Cloud, we recommend setting up email and Slack notifications (Account Settings > Notifications) for any failed runs. Then, debug these runs the same way you would debug any runs in development.

#11. One of my tests failed, how can I debug it?

To debug a failing test, find the SQL that dbt ran by:

  • dbt Cloud:
    • Within the test output, click on the failed test, and then select "Details"
  • dbt CLI:
    • Open the file path returned as part of the error message.
    • Navigate to the target/compiled/schema_tests directory for all compiled test queries

Copy the SQL into a query editor (in dbt Cloud, you can paste it into a new Statement), and run the query to find the records that failed.

#12. The columns of my seed changed, and now I get an error when running the `seed` command, what should I do?

If you changed the columns of your seed, you may get a Database Error:

$ dbt seed
Running with dbt=0.16.0-rc2
Found 0 models, 0 tests, 0 snapshots, 0 analyses, 130 macros, 0 operations, 1 seed file, 0 sources
12:12:27 | Concurrency: 8 threads (target='dev_snowflake')
12:12:27 |
12:12:27 | 1 of 1 START seed file dbt_claire.country_codes...................... [RUN]
12:12:30 | 1 of 1 ERROR loading seed file dbt_claire.country_codes.............. [ERROR in 2.78s]
12:12:31 |
12:12:31 | Finished running 1 seed in 10.05s.
Completed with 1 error and 0 warnings:
Database Error in seed country_codes (data/country_codes.csv)
000904 (42000): SQL compilation error: error line 1 at position 62
invalid identifier 'COUNTRY_NAME'
Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1

In this case, you should rerun the command with a --full-refresh flag, like so:

dbt seed --full-refresh

Why is this the case?

When you typically run dbt seed, dbt truncates the existing table and reinserts the data. This pattern avoids a drop cascade command, which may cause downstream objects (that your BI users might be querying!) to get dropped.

However, when column names are changed, or new columns are added, these statements will fail as the table structure has changed.

The --full-refresh flag will force dbt to drop cascade the existing table before rebuilding it.

#14. How do I preserve leading zeros in a seed?

If you need to preserve leading zeros (for example in a zipcode or mobile number):

  1. v0.16.0 onwards: Include leading zeros in your seed file, and use the column_types configuration with a varchar datatype of the correct length.
  2. Prior to v0.16.0: Use a downstream model to pad the leading zeros using SQL, for example: lpad(zipcode, 5, '0')

#15. Can I use seeds to load raw data?

Seeds should not be used to load raw data (for example, large CSV exports from a production database).

Since seeds are version controlled, they are best suited to files that contain business-specific logic, for example a list of country codes or user IDs of employees.

Loading CSVs using dbt's seed functionality is not performant for large files. Consider using a different tool to load these CSVs into your data warehouse.

#16. How do I load data into my warehouse?

dbt assumes that you already have a copy of your data, in your data warehouse. We recommend you use an off-the-shelf tool like Stitch or Fivetran to get data into your warehouse.

Can dbt be used to load data?

No, dbt does not extract or load data. It focuses on the transformation step only.

#17. How do I write long-form explanations in my descriptions?

If you need more than a sentence to explain a model, you can: 1. Split your description over multiple lines (yaml docs), like so:

version: 2
models:
- name: customers
description: >
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo
consequat.
  1. Use a docs block to write the description in a Markdown file.

#18. Do all my tests go in one file?

No! You can use as many files as you want! Some folks find it useful to have one file per model, we tend to have one per directory.

#19. Can I use environment variables in my profile?

Yes! Check out the docs on environment variables for more information.

#20. What should I name my profile?

We typically use a company name for a profile name, and then use targets to differentiate between dev and prod. Check out the docs on managing environments for more information.

#21. What tests should I add to my project?

We recommend that every model has a test on a primary key, that is, a column that is unique and not_null.

We also recommend that you test any assumptions on your source data. For example, if you believe that your payments can only be one of three payment methods, you should test that assumption regularly — a new payment method may introduce logic errors in your SQL.

In advanced dbt projects, we recommend using sources and running these source data-integrity tests against the sources rather than models.

#22. How do I remove deleted models from my data warehouse?

If you delete a model from your dbt project, dbt does not automatically drop the relation from your schema. This means that you can end up with extra objects in schemas that dbt creates, which can be confusing to other users.

(This can also happen when you switch a model from being a view or table, to ephemeral)

When you remove models from your dbt project, you should manually drop the related relations from your schema.

#23. How do I run models downstream of a seed?

You can run models downstream of a seed using the model selection syntax, and treating the seed like a model.

For example, the following would run all models downstream of a seed named country_codes:

$ dbt run --models country_codes+

#24. If I rerun dbt, will there be any downtime as models are rebuilt?

Nope! The SQL that dbt generates behind the scenes ensures that any relations are replaced atomically (i.e. your business users won't experience any downtime).

The implementation of this varies on each warehouse, check out the logs to see the SQL dbt is executing.

#25. How do I run one model at a time?

To run one model, use the --models flag (or -m flag), followed by the name of the model:

$ dbt run --models customers

Check out the model selection syntax documentation for more operators and examples.

#26. What should my profiles.yml file look like for my warehouse?

The structure of a profile looks different on each warehouse. Check out the supported databases page, and navigate to the Profile Setup section for your warehouse.

#27. Does my test file need to be named `schema.yml`?

No! You can name this file whatever you want (including whatever_you_want.yml), so long as:

  • The file is in your models/ directory
  • The file has .yml extension
  • The yaml follows the basic structure.

Check out the docs for more information.

#28. Can I put my seeds in a schema other than my target schema?

Or: Can I split my seeds across multiple schemas?

Yes! The schema configuration can be applied to seeds in your dbt_project.yml file.

dbt_project.yml
name: jaffle_shop
...
seeds:
jaffle_shop:
schema: mappings # all seeds in this project will use the mapping schema by default
marketing:
schema: marketing # seeds in the `data/mapping/ subdirectory will use the marketing schema

#29. How do I set a datatype for a column in my seed?

dbt will infer the datatype for each column based on the data in your CSV.

You can also explicitly set a datatype using the column_types configuration like so:

dbt_project.yml
seeds:
jaffle_shop: # you must include the project name
warehouse_locations:
column_types:
zipcode: varchar(5)

#30. Do hooks run with seeds?

Yes! The following hooks are available:

Configure these in your dbt_project.yml file.

#31. Why are profiles stored outside of my project?

Profiles are stored separately to dbt projects to avoid checking credentials into version control. Database credentials are extremely sensitive information and should never be checked into version control.

#32. How do I share my documentation with my team members?

If you're using dbt Cloud to deploy your project, and have the Team Plan, you can have up to 50 read only users, who will be able access the documentation for your project.

#33. What happens if the SQL in my query is bad?

Or:

I got a Database Error, what does that mean?

If there's a mistake in your SQL, dbt will return the error that your database returns.

$ dbt run --models customers
Running with dbt=0.15.0
Found 3 models, 9 tests, 0 snapshots, 0 analyses, 133 macros, 0 operations, 0 seed files, 0 sources
14:04:12 | Concurrency: 1 threads (target='dev')
14:04:12 |
14:04:12 | 1 of 1 START view model dbt_alice.customers.......................... [RUN]
14:04:13 | 1 of 1 ERROR creating view model dbt_alice.customers................. [ERROR in 0.81s]
14:04:13 |
14:04:13 | Finished running 1 view model in 1.68s.
Completed with 1 error and 0 warnings:
Database Error in model customers (models/customers.sql)
Syntax error: Expected ")" but got identifier `grand-highway-265418` at [13:15]
compiled SQL at target/run/jaffle_shop/customers.sql
Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1

Any models downstream of this model will also be skipped. Use the error message and the compiled SQL to debug any errors.

#34. How should I structure my project?

There's no one best way to structure a project! Every organization is unique.

If you're just getting started, check out how we (Fishtown Analytics) structure our dbt projects.

#35. What should I name my target?

We typically use targets to differentiate between development and production runs of dbt, naming the targets dev and prod respectively. Check out the docs on managing environments for more information.

#36. How do I test one model at a time?

Running tests on one model looks very similar to running a model: use the --models flag (or -m flag), followed by the name of the model:

dbt test --models customers

Check out the model selection syntax documentation for more operators and examples.

#37. How do I test and document seeds?

To test and document seeds, use a schema file and nest the configurations under a seeds: key

Example:

data/schema.yml
version: 2
seeds:
- name: country_codes
description: A mapping of two letter country codes to country names
columns:
- name: country_code
tests:
- unique
- not_null
- name: country_name
tests:
- unique
- not_null

#38. Do model names need to be unique?

Yes! To build dependencies between models, you need to use the ref function. The ref function only takes one argument — the model name (i.e. the filename). As a result, these model names need to be unique, even if they are in distinct folders.

Often, this question comes up because users want to give two models the same name in their warehouse, splitting them across separate schemas (e.g. stripe.users and app.users). Checkout the docs on custom aliases and custom schemas to achieve this.

#39. I got an "unused model configurations" error message, what does this mean?

You might have forgotten to nest your configurations under your project name, or you might be trying to apply configurations to a directory that doesn't exist. Check out this article to understand more.

#40. When should I run my tests?

You should run your tests whenever you are writing new code (to ensure you haven't broken any existing models by changing SQL), and whenever you run your transformations in production (to ensure that your assumptions about your source data are still valid).

#41. Which materialization should I use for my model?

Start out with views, and then change models to tables when required for performance reasons (i.e. downstream queries have slowed).

Check out the docs on materializations for advice on when to use each materialization.

#42. How did dbt choose which schema to build my models in?

By default, dbt builds models in your target schema. To change your target schema:

  • If you're developing in dbt Cloud, these are set for each user when you first use a development environment.
  • If you're developing with the dbt CLI, this is the schema: parameter in your profiles.yml file.

If you wish to split your models across multiple schemas, check out the docs on using custom schemas

Note: on BigQuery, dataset is used interchangeably with schema.