A dbt project informs dbt about the context of your project and how to transform your data (build your data sets). By design, dbt enforces the top-level structure of a dbt project such as the
dbt_project.yml file, the
models directory, the
snapshots directory, and so on. Within the directories of the top-level, you can organize your project in any way that meets the needs of your organization and data pipeline.
At a minimum, all a project needs is the
dbt_project.yml project configuration file. dbt supports a number of different resources, so a project may also include:
|models||Each model lives in a single file and contains logic that either transforms raw data into a dataset that is ready for analytics or, more often, is an intermediate step in such a transformation.|
|snapshots||A way to capture the state of your mutable tables so you can refer to it later.|
|seeds||CSV files with static data that you can load into your data platform with dbt.|
|tests||SQL queries that you can write to test the models and resources in your project.|
|macros||Blocks of code that you can reuse multiple times.|
|docs||Docs for your project that you can build.|
|sources||A way to name and describe the data loaded into your warehouse by your Extract and Load tools.|
|exposures||A way to define and describe a downstream use of your project.|
|metrics||A way for you to define metrics for your project.|
|analysis||A way to organize analytical SQL queries in your project such as the general ledger from your QuickBooks.|
When building out the structure of your project, you should consider these impacts on your organization's workflow:
- How would people run dbt commands — Selecting a path
- How would people navigate within the project — Whether as developers in the IDE or stakeholders from the docs
- How would people configure the models — Some bulk configurations are easier done at the directory level so people don’t have to remember to do everything in a config block with each new model
Every dbt project includes a project configuration file called
dbt_project.yml. It defines the directory of the dbt project and other project configurations.
dbt_project.yml to set up common project configurations such as:
|YAML key||Value description|
|name||Your project’s name in snake case|
|version||Version of your project|
|require-dbt-version||Restrict your project to only work with a range of dbt Core versions|
|profile||The profile dbt uses to connect to your data platform|
|model-paths||Directories to where your model and source files live|
|seed-paths||Directories to where your seed files live|
|test-paths||Directories to where your test files live|
|analysis-paths||Directories to where your analyses live|
|macro-paths||Directories to where your macros live|
|snapshot-paths||Directories to where your snapshots live|
|docs-paths||Directories to where your docs blocks live|
|vars||Project variables you want to use for data compilation|
For complete details on project configurations, see dbt_project.yml.
You can create new projects and share them with other people by making them available on a hosted git repository like GitHub, GitLab, and BitBucket.
During project initialization, dbt creates sample model files in your project directory to help you start developing quickly.
If you want to explore dbt projects more in-depth, you can clone dbt Lab’s Jaffle shop on GitHub. It's a runnable project that contains sample configurations and helpful notes.
If you want to see what a mature, production project looks like, check out the GitLab Data Team public repo.