How to connect to your warehouse when using the CLI
When you invoke dbt from the CLI, dbt parses your
dbt_project.yml for the name of the
profile to use to connect to your data warehouse.
# Example dbt_project.yml filename: 'jaffle_shop'profile: 'jaffle_shop'...
dbt then checks your
profiles.yml file for a profile with the same name. A profile contains all the details required to connect to your data warehouse.
This file generally lives outside of your dbt project to avoid sensitive credentials being check in to verison control. By default, dbt expects the
profiles.yml file to be located in the
# example profiles.yml filejaffle_shop:target: devoutputs:dev:type: postgreshost: localhostuser: alicepass: <password>port: 5432dbname: jaffle_shopschema: dbt_alicethreads: 4
Setting up your profiles.yml file for the first time
~/.dbt/profiles.yml file can be created by running dbt init
What goes in my profiles.yml file?
profiles.yml file, you can store as many profiles as you need. Typically, you would have one profile for each warehouse you use – for most organizations, this means you would only have one profile.
You can also configure some advanced dbt options in your
profiles.yml file, see Additional profile configurations.
What goes in a profile?
A profile consists of targets, and a specified default target.
Each target specifies the type of warehouse you are connecting to, the credentials to connect to the warehouse, and some dbt-specific configurations.
The credentials you need to provide in your target varies across warehouses – sample profiles for each supported warehouse are available in the Supported Databases section.
Pro Tip: You may need to surround your password in quotes if it contains special characters. More details here.
Populating your profile
To populate your profile, copy the correct sample profile for your warehouse into your
profiles.yml file and update the details as follows:
- Profile name: Replace the name of the profile with a sensible name – it’s often a good idea to use the name of your organization. Make sure that this is the same name as the
profileindicated in your
target: This is the default target your dbt project will use. It must be one of the targets you define in your profile. Commonly it is set to
- Populating your target:
type: The type of data warehouse you are connecting to
- Warehouse credentials: Get these from your database administrator if you don’t already have them. Remember that user credentials are very sensitive information that should not be shared.
schema: The default schema that dbt will build objects in.
threads: The number of threads the dbt project will run on.
You can find more information on which values to use in your targets below.
Validating your warehouse credentials
Use the debug command to check whether you can successfully connect to your warehouse. Simply run
dbt debug from within a dbt project to test your connection.
dbt supports multiple targets within one profile to encourage the use of separate development and production environments as discussed in Managing Environments.
A typical profile for an analyst using dbt locally will have a target named
dev, and have this set as the default.
You may also have a
prod target within your profile, which creates the objects in your production schema. However, since it's often desirable to perform production runs on a schedule, we recommend deploying your dbt project on a separate machine to your local machine. As such, most dbt users will only have a dev target in their profile on their local machine.
If you do have multiple targets in your profile, and want to use a target other than the default, you can do this using the
--target option when issuing a dbt command.
Understanding warehouse credentials
We recommend that each dbt user has their own set of database credentials, including a separate user for production runs of dbt – this helps debug rogue queries, simplifies ownerships of schemas, and improves security.
To ensure the user credentials you use in your target allow dbt to run, you will need to ensure the user has appropriate privileges. While the exact privileges needed varies between data warehouses, at a minimum your user must be able to:
- read source data
- create schemas¹
- read system tables
Running dbt without create schema privileges
If your user is unable to be granted the privilege to create schemas, your dbt runs should instead target an existing schema that your user has permission to create relations within.
Understanding target schemas
The target schema represents the default schema that dbt will build objects into, and is often used as the differentiator between separate environments within a warehouse.
Schemas in BigQuery
dbt uses the term "schema" in a target across all supported warehouses for consistency. Note that in the case of BigQuery, a schema is actually a dataset.
The schema used for production should be named in a way that makes it clear that it is ready for end-users to use for analysis – we often name this
In development, a pattern we’ve found to work well is to name the schema in your
dbt_<username>. Suffixing your name to the schema enables multiple users to develop in dbt, since each user will have their own separate schema for development, so that users will not build over the top of each other, and ensuring that object ownership and permissions are consistent across an entire schema.
Note that there’s no need to create your target schema beforehand – dbt will check if the schema already exists when it runs, and create it if it doesn’t.
While the target schema represents the default schema that dbt will use, it may make sense to split your models into separate schemas, which can be done by using custom schemas.
When dbt runs, it creates a directed acyclic graph (DAG) of links between models. The number of threads represents the maximum number of paths through the graph dbt may work on at once – increasing the number of threads can minimize the run time of your project.
For example, if you specify
threads: 1, dbt will start building only one model, and finish it, before moving onto the next. Specifying
threads: 8 means that dbt will work on up to 8 models at once without violating dependencies – the actual number of models it can work on will likely be constrained by the available paths through the dependency graph.
There's no set limit of the maximum number of threads you can set – while increasing the number of threads generally decreases execution time, there are a number of things to consider:
- Increasing the number of threads increases the load on your warehouse, which may impact other tools in your data stack. For example, if your BI tool uses the same compute resources as dbt, their queries may get queued during a dbt run.
- The number of concurrent queries your database will allow you to run may be a limiting factor in how many models can be actively built – some models may queue while waiting for an available query slot.
Generally the optimal number of threads depends on your data warehouse and its configuration. It’s best to test different values to find the best number of threads for your project. We recommend setting this to 4 to start with.
You can use a different number of threads than the value defined in your target by using the
--threads option when executing a dbt command.
Advanced profile configuration
Using a custom profile directory
By default, dbt expects your
profiles.yml file to be located in the
~/.dbt/ directory. To check the expected location of your
profiles.yml file for your installation of dbt, you can run the following:
$ dbt debug --config-dirTo view your profiles.yml file, run:open /Users/alice/.dbt
You may want to have your
profiles.yml file stored in a different directory – for example, if you are using environment variables to load your credentials, you might choose to include this file in your version controlled dbt project, and direct dbt to load the file from there.
Note that the file always needs to be called
profiles.yml, regardless of which directory it is in.
There are two ways to direct dbt to a different location for your
1. Use the
--profiles-dir option when executing a dbt command
This option can be used as follows:
$ dbt run --profiles-dir path/to/directory
If using this method, the
--profiles-dir option needs to be provided every time you run a dbt command.
2. Use the
DBT_PROFILES_DIR environment variable to change the default location
Specifying this environment variable overrides the directory that dbt looks for your
profiles.yml file in. You can specify this by running:
$ export DBT_PROFILES_DIR=path/to/directory
--profiles-dir option is used in a dbt command, it will take precedence over this environment variable.
Using environment variables in your profile
Credentials can either be placed directly into the
profiles.yml file, or they can be loaded from environment variables. This is especially useful for production deployments of dbt. You can find more information about using environment variables here.
Additional profile configurations
Partial parsing can improve the performance characteristics of dbt runs by limiting the number of files that are parsed by dbt. Here, "parsing" means reading files in a dbt project from disk and capturing
config() method calls. dbt uses these method calls to determine 1) the shape of the dbt DAG and 2) the supplied configurations for dbt resources.
If partial parsing is enabled and files are unchanged between invocations of dbt, then dbt does not need to re-parse these files -- it can instead use the parsed representation from the last invocation of dbt. If a file has changed between invocations of dbt, then dbt will re-parse the file and update the parsed node cache accordingly.
Use caution when enabling partial parsing in dbt. If environment variables or variables specified on the CLI with
--vars control the parsed representation of your project, then the logic executed by dbt may differ from the logic specified in your project. Partial parsing should only be used when all of the logic in your dbt project is encoded in the files inside of that project.
To enable partial parsing in your project, use the
--partial-parse dbt flag, or specify
partial_parse: true in your
This value can be overridden using the
Using colors in terminal output
By default, dbt will colorize the output it prints in your terminal. You can turn this off by adding the following to your
Configuring printer width
By default, dbt will print out lines padded to 80 characters wide. You can change this setting by adding the following to your
We want to build the best version of dbt possible, and a crucial part of that is understanding how users work with dbt. To this end, we've added some simple event tracking to dbt (using Snowplow). We do not track credentials, model contents or model names (we consider these private, and frankly none of our business).
Usage statistics are fired when dbt is invoked and when models are run. These events contain basic platform information (OS + python version). The schemas for these events can be seen here
By default this is turned on – you can opt out of event tracking at any time by adding the following to your