List of FAQs

#1. What other model configurations are there?

You can also configure:

  • tags to support easy categorization and graph selection
  • custom schemas to split your models across multiple schemas
  • aliases if your view/table name should differ from the filename
  • Snippets of SQL to run at the start or end of a model, known as hooks
  • Warehouse-specific configurations for performance (e.g. sort and dist keys on Redshift, partitions on BigQuery)

Check out the docs on model configurations to learn more.

#2. What materializations are available in dbt?

dbt ships with four materializations: view, table, incremental and ephemeral. Check out the documentation on materializations for more information on each of these options.

You can also create your own custom materializations, if required however this is an advanced feature of dbt.

#3. What tests are available for me to use in dbt?

Out of the box, dbt ships with the following tests:

  • unique
  • not_null
  • accepted_values
  • relationships (i.e. referential integrity)

You can also write your own custom schema tests.

Some additional custom schema tests have been open-sourced in the dbt-utils package, check out the docs on packages to learn how to make these tests available in your project.

#4. How do I build one seed at a time?

As of v0.16.0, you can use a --select option with the dbt seed command, like so:

$ dbt seed --select country_codes

There is also an --exclude option.

Check out more in the model selection syntax documentation.

Prior to v0.16.0, there was no way to build one seed at a time.

#5. How can I see the SQL that dbt is running?

To check out the SQL that dbt is running, you can look in:

  • dbt Cloud:
    • Within the run output, click on a model name, and then select "Details"
  • dbt CLI:
    • The target/compiled/ directory for compiled select statements
    • The target/run/ directory for compiled create statements
    • The logs/dbt.log file for verbose logging.

#6. What is the difference between dbt Core, the dbt CLI and dbt Cloud?

dbt Core is the software that takes a dbt project (.sql and .yml files) and a command and then creates tables/views in your warehouse. dbt Core includes a command line interface (CLI) so that users can execute dbt commands using a terminal program. dbt Core is open source and free to use.

dbt Cloud is an application that helps teams use dbt. dbt Cloud provides a web-based IDE to develop dbt projects, a purpose-built scheduler, and a way to share dbt documentation with your team. dbt Cloud offers a number of features for free, as well as additional features in paid tiers (check out the pricing here).

#7. Can I store my seeds in a directory other than the `data` directory in my project?

Be default, dbt expects your seed files to be located in the data subdirectory of your project.

To change this, update the data-paths configuration in your dbt_project.yml file, like so:

dbt_project.yml
data-paths: ["seeds"]

#8. Can I store my models in a directory other than the `models` directory in my project?

Be default, dbt expects your seed files to be located in the models subdirectory of your project.

To change this, update the source-paths configuration in your dbt_project.yml file, like so:

dbt_project.yml
source-paths: ["transformations"]

#9. Can I store my snapshots in a directory other than the `snapshot` directory in my project?

Be default, dbt expects your seed files to be located in the snapshots subdirectory of your project.

To change this, update the snapshot-paths configuration in your dbt_project.yml file, like so:

dbt_project.yml
snapshot-paths: ["snapshots"]

Note that you cannot co-locate snapshots and models in the same directory.

#10. Do I need to create my target schema before running dbt?

Nope! dbt will check if the schema exists when it runs. If the schema does not exist, dbt will create it for you.

#11. How do I create dependencies between models?

When you use the ref function, dbt automatically infers the dependencies between models.

For example, consider a model, customer_orders, like so:

models/customer_orders.sql
select
customer_id,
min(order_date) as first_order_date,
max(order_date) as most_recent_order_date,
count(order_id) as number_of_orders
from {{ ref('stg_orders') }}
group by 1

There's no need to explicitly define these dependencies. dbt will understand that the stg_orders model needs to be built before the above model (customer_orders). When you execute dbt run, you will see these being built in order:

$ dbt run
Running with dbt=0.16.0
Found 2 models, 28 tests, 0 snapshots, 0 analyses, 130 macros, 0 operations, 0 seed files, 3 sources
11:42:52 | Concurrency: 8 threads (target='dev_snowflake')
11:42:52 |
11:42:52 | 1 of 2 START view model dbt_claire.stg_jaffle_shop__orders........... [RUN]
11:42:55 | 1 of 2 OK creating view model dbt_claire.stg_jaffle_shop__orders..... [CREATE VIEW in 2.50s]
11:42:55 | 2 of 2 START relation dbt_claire.customer_orders..................... [RUN]
11:42:56 | 2 of 2 OK creating view model dbt_claire.customer_orders............. [CREATE VIEW in 0.60s]
11:42:56 | Finished running 2 view models in 15.13s.
Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2

To get some practice with this, we recommend you complete the tutorial to build your first dbt project

#12. What privileges does my database user need to use dbt?

Your user will need to be able to:

  • select from raw data in your warehouse (i.e. data to be transformed)
  • create schemas, and therefore create tables/views within that schema¹
  • read system views to generate documentation (i.e. views in information_schema)

On Postgres, Redshift, and Snowflake, use a series of grants to ensure that your user has the correct privileges.

On BigQuery, use the "BigQuery User" role to assign these privileges.


¹Alternatively, a separate user can create a schema for the dbt user, and then grant the user privileges to create within this schema. We generally recommend granting your dbt user the ability to create schemas, as it is less complicated to implement.

#13. Are there any example dbt projects?

Yes!

  • Getting Started Tutorial: You can build your own example dbt project in the Getting Started Tutorial
  • Jaffle Shop: A demonstration project (closely related to the tutorial) for a fictional ecommerce store (source code)
  • MRR Playbook: A demonstration project that models subscription revenue (source code, docs)
  • Attribution Playbook: A demonstration project that models marketing attribution (source code, docs)
  • GitLab: Gitlab's internal dbt project is open source and is a great example of how to use dbt at scale (source code, docs)

If you have an example project to add to this list, suggest an edit.

#14. How do I exclude a table from a freshness snapshot?

Some tables in a data source may be updated infrequently. If you've set a freshness property at the source level, this table is likely to fail checks.

To work around this, you can use an empty dictionary (freshness: {}) or null (freshness: null) to "unset" the freshness for a particular table:

models/<filename>.yml
version: 2
sources:
- name: jaffle_shop
database: raw
freshness:
warn_after: {count: 12, period: hour}
error_after: {count: 24, period: hour}
loaded_at_field: _etl_loaded_at
tables:
- name: orders
- name: product_skus
freshness: {} # do not check freshness for this table

#15. What happens if one of my runs fails?

If you're using dbt Cloud, we recommend setting up email and Slack notifications (Account Settings > Notifications) for any failed runs. Then, debug these runs the same way you would debug any runs in development.

#16. One of my tests failed, how can I debug it?

To debug a failing test, find the SQL that dbt ran by:

  • dbt Cloud:
    • Within the test output, click on the failed test, and then select "Details"
  • dbt CLI:
    • Open the file path returned as part of the error message.
    • Navigate to the target/compiled/schema_tests directory for all compiled test queries

Copy the SQL into a query editor (in dbt Cloud, you can paste it into a new Statement), and run the query to find the records that failed.

#17. The columns of my seed changed, and now I get an error when running the `seed` command, what should I do?

If you changed the columns of your seed, you may get a Database Error:

$ dbt seed
Running with dbt=0.16.0-rc2
Found 0 models, 0 tests, 0 snapshots, 0 analyses, 130 macros, 0 operations, 1 seed file, 0 sources
12:12:27 | Concurrency: 8 threads (target='dev_snowflake')
12:12:27 |
12:12:27 | 1 of 1 START seed file dbt_claire.country_codes...................... [RUN]
12:12:30 | 1 of 1 ERROR loading seed file dbt_claire.country_codes.............. [ERROR in 2.78s]
12:12:31 |
12:12:31 | Finished running 1 seed in 10.05s.
Completed with 1 error and 0 warnings:
Database Error in seed country_codes (data/country_codes.csv)
000904 (42000): SQL compilation error: error line 1 at position 62
invalid identifier 'COUNTRY_NAME'
Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1

In this case, you should rerun the command with a --full-refresh flag, like so:

dbt seed --full-refresh

Why is this the case?

When you typically run dbt seed, dbt truncates the existing table and reinserts the data. This pattern avoids a drop cascade command, which may cause downstream objects (that your BI users might be querying!) to get dropped.

However, when column names are changed, or new columns are added, these statements will fail as the table structure has changed.

The --full-refresh flag will force dbt to drop cascade the existing table before rebuilding it.

#18. How do I do X? How do I solve Y problem? Where can I go to get help?

First, please check the relevant documentation and FAQs. Also, search for answers using your favorite search engine and in Discourse.

If you can’t find an answer, please take a few minutes to formulate your question well. Explaining the problems you are facing clearly will help others help you. See our guide on asking good questions.

Then, feel free to join our Slack community and ask it in the correct channel. Please read the rules of the road to make sure you know how to best participate in the community.

If you need dedicated support with building your dbt project, also consider engaging one of our consulting partners.

Note: If you are a dbt Cloud user and need help with one of the following issues, please reach out to us by using the speech bubble (💬) in the dbt Cloud interface.

  • Account setup (e.g. connection issues, repo connections)
  • Billing
  • Bug reports related to the web interface

As a rule of thumb, if you are using dbt Cloud, but your problem is related to code within your dbt project, then please follow the above process rather than reaching out to support.

#20. If models can only be `select` statements, how do I insert records?

For those coming from an ETL (Extract Transform Load) paradigm, there's often a desire to write transformations as insert and update statements. In comparison, dbt will wrap your select query in a create table as statement, which can feel counter-productive.

  • If you wish to use insert statements for perfomance reasons (i.e. to reduce data that is processed), consider incremental models
  • If you wish to use insert statements since your source data is constantly changing (e.g. to create "Type 2 Slowly Changing Dimensions"), consider snapshotting your source data, and building models on top of your snaphots.

#21. How do I preserve leading zeros in a seed?

If you need to preserve leading zeros (for example in a zipcode or mobile number):

  1. v0.16.0 onwards: Include leading zeros in your seed file, and use the column_types configuration with a varchar datatype of the correct length.
  2. Prior to v0.16.0: Use a downstream model to pad the leading zeros using SQL, for example: lpad(zipcode, 5, '0')

#22. Can I use seeds to load raw data?

Seeds should not be used to load raw data (for example, large CSV exports from a production database).

Since seeds are version controlled, they are best suited to files that contain business-specific logic, for example a list of country codes or user IDs of employees.

Loading CSVs using dbt's seed functionality is not performant for large files. Consider using a different tool to load these CSVs into your data warehouse.

#23. How do I load data into my warehouse?

dbt assumes that you already have a copy of your data, in your data warehouse. We recommend you use an off-the-shelf tool like Stitch or Fivetran to get data into your warehouse.

Can dbt be used to load data?

No, dbt does not extract or load data. It focuses on the transformation step only.

#24. How do I write long-form explanations in my descriptions?

If you need more than a sentence to explain a model, you can:

  1. Split your description over multiple lines (yaml docs), like so:
version: 2
models:
- name: customers
description: >
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo
consequat.
  1. Use a docs block to write the description in a Markdown file.

#25. Can I build my models in a schema other than my target schema?

Or: Can I split my models across multiple schemas?

Yes! Use the schema configuration in your dbt_project.yml file, or using a config block:

dbt_project.yml
name: jaffle_shop
...
models:
jaffle_shop:
marketing:
schema: marketing # seeds in the `models/mapping/ subdirectory will use the marketing schema
models/customers.sql
{{
config(
schema='core'
)
}}

#26. Should I use separate files to declare resource properties, or one large file?

It's up to you:

  • Some folks find it useful to have one file per model (or source / snapshot / seed etc)
  • Some find is useful to have one per directory, documenting and testing multiple models in one file

Choose what works for your team. We have more recommendations in our guide on structuring dbt project.

#27. Do all my tests go in one file?

No! You can use as many files as you want! Some folks find it useful to have one file per model, we tend to have one per directory.

#28. Can I use environment variables in my profile?

Yes! Check out the docs on environment variables for more information.

#29. What should I name my profile?

We typically use a company name for a profile name, and then use targets to differentiate between dev and prod. Check out the docs on managing environments for more information.

#30. What should I name my dbt project?

The name of your company (in kebab_case) often makes for a good project name.

#31. Can I add tests and descriptions in a config block?

No — properties (including tests and descriptions) must be declared in separate .yml files, rather than in config blocks.

Configurations:

  • tell dbt how to build something in your warehouse (for example, whether a model should be a table or view, or which SQL to use when running a snapshot)
  • are defined in your dbt_project.yml file or a config block in a .sql file.

Properties:

  • are used to declare things about your dbt project or data warehouse.
  • are defined in separate .yml files

For now, these things are separate to each other, but in the future, we may choose to co-locate them in the same file.

#32. What tests should I add to my project?

We recommend that every model has a test on a primary key, that is, a column that is unique and not_null.

We also recommend that you test any assumptions on your source data. For example, if you believe that your payments can only be one of three payment methods, you should test that assumption regularly — a new payment method may introduce logic errors in your SQL.

In advanced dbt projects, we recommend using sources and running these source data-integrity tests against the sources rather than models.

#33. How do I remove deleted models from my data warehouse?

If you delete a model from your dbt project, dbt does not automatically drop the relation from your schema. This means that you can end up with extra objects in schemas that dbt creates, which can be confusing to other users.

(This can also happen when you switch a model from being a view or table, to ephemeral)

When you remove models from your dbt project, you should manually drop the related relations from your schema.

#34. If I can name these files whatever I'd like, what should I name them?

It's up to you! Here's a few options:

  • Default to the existing terminology: schema.yml (though this does make it hard to find the right file over time)
  • Use the same name as your directory (assuming you're using sensible names for your directories)
  • If you test and document one model (or seed, snapshot, macro etc.) per file, you can give it the same name as the model (or seed, snapshot, macro etc.)

Choose what works for your team. We have more recommendations in our guide on structuring dbt project.

#35. How do I run models downstream of a seed?

You can run models downstream of a seed using the model selection syntax, and treating the seed like a model.

For example, the following would run all models downstream of a seed named country_codes:

$ dbt run --models country_codes+

#36. If I rerun dbt, will there be any downtime as models are rebuilt?

Nope! The SQL that dbt generates behind the scenes ensures that any relations are replaced atomically (i.e. your business users won't experience any downtime).

The implementation of this varies on each warehouse, check out the logs to see the SQL dbt is executing.

#37. How do I run one model at a time?

To run one model, use the --models flag (or -m flag), followed by the name of the model:

$ dbt run --models customers

Check out the model selection syntax documentation for more operators and examples.

#38. How do I run one snapshot at a time?

To run one snapshot, use the --select flag, followed by the name of the snapshot:

$ dbt snapshot --select order_snapshot

Check out the model selection syntax documentation for more operators and examples.

#39. How do I run models downstream of one source?

To run models downstream of a source, use the source: selector:

$ dbt run --models source:jaffle_shop+

(You can also use the -m shorthand here instead of --models)

To run models downstream of one source table:

$ dbt run --models source:jaffle_shop.orders+

Check out the model selection syntax for more examples!

#40. What should my profiles.yml file look like for my warehouse?

The structure of a profile looks different on each warehouse. Check out the supported databases page, and navigate to the Profile Setup section for your warehouse.

#41. Does my `.yml` file containing tests and descriptions need to be named `schema.yml`?

No! You can name this file whatever you want (including whatever_you_want.yml), so long as:

  • The file is in your models/ directory¹
  • The file has .yml extension

Check out the docs for more information.

¹If you're declaring properties for seeds, snapshots, or macros, you can also place this file in the related directory — data/, snapshots/ and macros/ respectively.

#42. schema-yml

#43. Can I build my seeds in a schema other than my target schema?

Or: Can I split my seeds across multiple schemas?

Yes! Use the schema configuration in your dbt_project.yml file.

dbt_project.yml
name: jaffle_shop
...
seeds:
jaffle_shop:
schema: mappings # all seeds in this project will use the mapping schema by default
marketing:
schema: marketing # seeds in the `data/mapping/ subdirectory will use the marketing schema

#44. How do I set a datatype for a column in my seed?

dbt will infer the datatype for each column based on the data in your CSV.

You can also explicitly set a datatype using the column_types configuration like so:

dbt_project.yml
seeds:
jaffle_shop: # you must include the project name
warehouse_locations:
column_types:
zipcode: varchar(5)

#45. Do hooks run with seeds?

Yes! The following hooks are available:

Configure these in your dbt_project.yml file.

#46. Why are profiles stored outside of my project?

Profiles are stored separately to dbt projects to avoid checking credentials into version control. Database credentials are extremely sensitive information and should never be checked into version control.

#47. How do I share my documentation with my team members?

If you're using dbt Cloud to deploy your project, and have the Team Plan, you can have up to 50 read only users, who will be able access the documentation for your project.

#48. How often should I run the snapshot command?

Snapshots are a batch-based approach to change data capture. The dbt snapshot command must be run on a schedule to ensure that changes to tables are actually recorded! While individual use-cases may vary, snapshots are intended to be run between hourly and daily. If you find yourself snapshotting more frequently then that, consider if there isn't a more appropriate way to capture changes in your source data tables.

#49. Are the results of freshness stored anywhere?

Yes!

The snapshot-freshness command will output a pass/warning/error status for each table selected in the freshness snapshot.

Additionally, dbt will write the freshness results to a file in the target/ directory called sources.json by default. You can also override this destination, use the -o flag to the snapshot-freshness command.

#50. Do snapshots capture hard deletes?

No, snapshots do not invalidate records when they are deleted from the result set.

Check out the discussion on:

It's also possible to create your own snapshot strategy that handles hard deletes (docs) — this is an advanced usage of dbt!

#51. Do hooks run with snapshots?

Yes! The following hooks are available for snapshots:

#52. What happens if I add new columns to my snapshot query?

When the columns of your source query changes, dbt will attempt to reconcile this change in the destination snapshot table. dbt does this by:

  1. Creating new columns from the source query in the destination table
  2. Expanding the size of string types where necessary (eg. varchars on Redshift)

dbt will not delete columns in the destination snapshot table if they are removed from the source query. It will also not change the type of a column beyond expanding the size of varchar columns. That is, if a string column is changed to a date column in the snapshot source query, dbt will not attempt to change the type of the column in the destination table.

#53. Why is there only one `target_schema` for snapshots?

Snapshots build into the same target_schema, no matter who is running them.

In comparison, models build into a separate schema for each user — this helps maintain separate development and production environments.

So, why the difference?

Let's assume you are running your snapshot regularly. If the model had a different target in dev (e.g. dbt_claire) compared to prod (e.g. analytics), when you ref the model in dev, dbt would select from a snapshot that has not been run regularly. This can make it hard to build models since the data differs from prod.

Instead, in the models that ref your snapshots, it makes more sense to select from the production version of your snapshot, even when developing models. In this way, snapshot tables are more similar to source data than they are to proper dbt models.

For this reason, there is only one target_schema, which is not environment-aware by default.

However, this can create problems if you need to run a snapshot command when developing your models, or during a CI run. Fortunately, there's a few workarounds — check out this Discourse article.

#54. How do I snapshot freshness for one source only?

Use the --select flag to snapshot freshness for specific sources. Eg:

# Snapshot freshness for all Snowplow tables:
$ dbt source snapshot-freshness --select jaffle_shop
# Snapshot freshness for a particular source table:
$ dbt source snapshot-freshness --select jaffle_shop.orders
# Snapshot freshness for multiple particular source tables:
$ dbt source snapshot-freshness --select jaffle_shop.orders jaffle_shop.customers

See the source snapshot-freshness command reference for more information.

#55. What if my source is in a poorly named schema or table?

By default, dbt will use the name: parameters to construct the source reference.

If these names are a little less-than-perfect, use the schema and identifier properties to define the names as per the database, and use your name: property for the name that makes sense!

models/<filename>.yml
version: 2
sources:
- name: jaffle_shop
schema: postgres_backend_public_schema
database: raw
tables:
- name: orders
identifier: api_orders

In a downstream model:

select * from {{ source('jaffle_shop', 'orders') }}

Will get compiled to:

select * from raw.postgres_backend_public_schema.api_orders

#56. What if my source is in a different database to my target database?

Use the database property to define the database that the source is in.

models/<filename>.yml
version: 2
sources:
- name: jaffle_shop
database: raw
tables:
- name: orders
- name: customers

#57. I need to use quotes to select from my source, what should I do?

This is reasonably common on Snowflake in particular.

By default, dbt will not quote the database, schema, or identifier for the source tables that you've specified.

To force dbt to quote one of these values, use the quoting property:

models/<filename>.yml
version: 2
sources:
- name: jaffle_shop
database: raw
quoting:
database: true
schema: true
identifier: true
tables:
- name: order_items
- name: orders
# This overrides the `jaffle_shop` quoting config
quoting:
identifier: false

#58. How do I specify column types?

Simply cast the column to the correct type in your model:

select
id,
created::timestamp as created
from some_other_table

You might have this question if you're used to running statements like this:

create table dbt_alice.my_table
id integer,
created timestamp;
insert into dbt_alice.my_table (
select id, created from some_other_table
)

In comparison, dbt would build this table using a create table as statement:

create table dbt_alice.my_table as (
select id, created from some_other_table
)

So long as your model queries return the correct column type, the table you create will also have the correct column type.

To define additional column options:

  • Rather than enforcing uniqueness and not-null constraints on your column, use dbt's testing functionality to check that your assertions about your model hold true.
  • Rather than creating default values for a column, use SQL to express defaults (e.g. coalesce(updated_at, current_timestamp()) as updated_at)
  • In edge-cases where you do need to alter a column (e.g. column-level encoding on Redshift), consider implementing this via a post-hook.

#59. Which SQL dialect should I write my models in?

Or:

Which SQL dialect does dbt use?

dbt can feel like magic, but it isn't actually magic. Under the hood, it's running SQL in your own warehouse — your data is not processed outside of your warehouse.

As such, your models should just use the SQL dialect of your own database. Then, when dbt wraps your select statements in the appropriate DDL or DML, it will use the correct DML for your warehouse — all of this logic is written in to dbt.

You can find more information about the databases dbt supports in the Supported Databases docs.

Want to go a little deeper on how this works? Consider a snippet of SQL that works on each warehouse:

models/test_model.sql
select 1 as my_column

To replace an existing table, here's an illustrative example of the SQL dbt will run on different warehouses (the actual SQL can get much more complicated than this!)

-- you can't create or replace on redshift, so use a transaction to do this in an atomic way
begin;
create table "dbt_alice"."test_model__dbt_tmp" as (
select 1 as my_column
);
alter table "dbt_alice"."test_model" rename to "test_model__dbt_backup";
alter table "dbt_alice"."test_model__dbt_tmp" rename to "test_model"
commit;
begin;
drop table if exists "dbt_alice"."test_model__dbt_backup" cascade;
commit;

#60. What happens if the SQL in my query is bad?

Or:

I got a Database Error, what does that mean?

If there's a mistake in your SQL, dbt will return the error that your database returns.

$ dbt run --models customers
Running with dbt=0.15.0
Found 3 models, 9 tests, 0 snapshots, 0 analyses, 133 macros, 0 operations, 0 seed files, 0 sources
14:04:12 | Concurrency: 1 threads (target='dev')
14:04:12 |
14:04:12 | 1 of 1 START view model dbt_alice.customers.......................... [RUN]
14:04:13 | 1 of 1 ERROR creating view model dbt_alice.customers................. [ERROR in 0.81s]
14:04:13 |
14:04:13 | Finished running 1 view model in 1.68s.
Completed with 1 error and 0 warnings:
Database Error in model customers (models/customers.sql)
Syntax error: Expected ")" but got identifier `grand-highway-265418` at [13:15]
compiled SQL at target/run/jaffle_shop/customers.sql
Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1

Any models downstream of this model will also be skipped. Use the error message and the compiled SQL to debug any errors.

#61. How should I structure my project?

There's no one best way to structure a project! Every organization is unique.

If you're just getting started, check out how we (Fishtown Analytics) structure our dbt projects.

#62. What should I name my target?

We typically use targets to differentiate between development and production runs of dbt, naming the targets dev and prod respectively. Check out the docs on managing environments for more information.

#63. How do I test one model at a time?

Running tests on one model looks very similar to running a model: use the --models flag (or -m flag), followed by the name of the model:

dbt test --models customers

Check out the model selection syntax documentation for more operators and examples.

#64. How do I test and document seeds?

Changelog

To test and document seeds, use a schema file and nest the configurations under a seeds: key

Example:

data/schema.yml
version: 2
seeds:
- name: country_codes
description: A mapping of two letter country codes to country names
columns:
- name: country_code
tests:
- unique
- not_null
- name: country_name
tests:
- unique
- not_null

#65. How do I run tests on just my sources?

To run tests on all sources, use the following command:

$ dbt test --models source:*

(You can also use the -m shorthand here instead of --models)

To run tests on one source (and all of its tables):

$ dbt test --models source:jaffle_shop

And, to run tests on one source table only:

$ dbt test --models source:jaffle_shop.orders

Yep, we know this syntax is a little less than ideal, so we're hoping to improve it in a future release. Check out the model selection syntax for more examples!

#66. Do model names need to be unique?

Yes! To build dependencies between models, you need to use the ref function. The ref function only takes one argument — the model name (i.e. the filename). As a result, these model names need to be unique, even if they are in distinct folders.

Often, this question comes up because users want to give two models the same name in their warehouse, splitting them across separate schemas (e.g. stripe.users and app.users). Checkout the docs on custom aliases and custom schemas to achieve this.

#67. I got an "unused model configurations" error message, what does this mean?

You might have forgotten to nest your configurations under your project name, or you might be trying to apply configurations to a directory that doesn't exist. Check out this article to understand more.

#68. When should I run my tests?

You should run your tests whenever you are writing new code (to ensure you haven't broken any existing models by changing SQL), and whenever you run your transformations in production (to ensure that your assumptions about your source data are still valid).

#69. Which materialization should I use for my model?

Start out with views, and then change models to tables when required for performance reasons (i.e. downstream queries have slowed).

Check out the docs on materializations for advice on when to use each materialization.

#70. How did dbt choose which schema to build my models in?

By default, dbt builds models in your target schema. To change your target schema:

  • If you're developing in dbt Cloud, these are set for each user when you first use a development environment.
  • If you're developing with the dbt CLI, this is the schema: parameter in your profiles.yml file.

If you wish to split your models across multiple schemas, check out the docs on using custom schemas

Note: on BigQuery, dataset is used interchangeably with schema.

#71. Why can't I just write DML in my transformations?

Or:

I'm already familiar with DML, and can write these statements manually, why should I use dbt to do this?

select statements make transformations accessible

More people know how to write select statements, than DML, making the transformation layer accessible to more people!

Writing good DML is hard.

If you write the DDL / DML yourself you can end up getting yourself tangled in problems like:

  • What happens if the table already exists? Or this table already exists as a view, but now I want it to be a table?
  • What if the schema already exists? Or, should I check if the schema already exists?
  • How do I replace a model atomically (such that there's no down-time for someone querying the table)
  • What if I want to parameterize my schema so I can run these transformations in a development environment?
  • What order do I need to run these statements in? If I run a cascade does it break other things?

Each of these problems can be solved, but they are unlikely to be the best use of your time.

dbt does more than generate SQL

You can test your models, generate documentation, create snapshots, and more!

You reduce your vendor lock in

SQL dialects tend to diverge the most in DML and DDL (rather than in select statements) — check out the example here. By writing less SQL, it can make a migration to a new database technology easier.

If you do need to write custom DML, there are ways to do this in dbt using custom materializations.

#72. Why do model and source yml files always start with `version: 2`?

Once upon a time, the structure of these .yml files was very different (s/o to anyone who was using dbt back then!). Adding version: 2 allowed us to make this structure more extensible.

Currently, Version 2 is the only supported version for these files. We kept version: around as a required key so that in the future, if we need to introduce a new structure for these files, we'll be able to do this more easily.

#73. Can I use a yaml file extension?

No. At present, dbt will only search for files with a .yml file extension. In a future release of dbt, dbt will also search for files with a .yaml file extension.