Connecting to your warehouse

How to connect to your warehouse when using the CLI

When you invoke dbt from the CLI, dbt parses your dbt_project.yml for the name of the profile to use to connect to your data warehouse.

# Example dbt_project.yml file
name: 'jaffle_shop'
profile: 'jaffle_shop'

dbt then checks your profiles.yml file for a profile with the same name. A profile contains all the details required to connect to your data warehouse.

This file generally lives outside of your dbt project to avoid sensitive credentials being check in to verison control. By default, dbt expects the profiles.yml file to be located in the ~/.dbt/ directory.

# example profiles.yml file
target: dev
type: postgres
host: localhost
user: alice
pass: <password>
port: 5432
dbname: jaffle_shop
schema: dbt_alice
threads: 4

What goes in my profiles.yml file?

In your profiles.yml file, you can store as many profiles as you need. Typically, you would have one profile for each warehouse you use – for most organizations, this means you would only have one profile.

You can also configure some advanced dbt options in your profiles.yml file, see Additional profile configurations.

What goes in a profile?

A profile consists of targets, and a specified default target.

Each target specifies the type of warehouse you are connecting to, the credentials to connect to the warehouse, and some dbt-specific configurations.

The credentials you need to provide in your target varies across warehouses – sample profiles for each supported warehouse are available in the Supported Databases section.

Pro Tip: You may need to surround your password in quotes if it contains special characters. More details here.

Populating your profile

To populate your profile, copy the correct sample profile for your warehouse into your profiles.yml file and update the details as follows:

  • Profile name: Replace the name of the profile with a sensible name – it’s often a good idea to use the name of your organization. Make sure that this is the same name as the profile indicated in your dbt_project.yml file.
  • target: This is the default target your dbt project will use. It must be one of the targets you define in your profile. Commonly it is set to dev.
  • Populating your target:
    • type: The type of data warehouse you are connecting to
    • Warehouse credentials: Get these from your database administrator if you don’t already have them. Remember that user credentials are very sensitive information that should not be shared.
    • schema: The default schema that dbt will build objects in.
    • threads: The number of threads the dbt project will run on.

You can find more information on which values to use in your targets below.

Understanding profiles

Understanding targets

dbt supports multiple targets within one profile to encourage the use of separate development and production environments as discussed in Managing Environments.

A typical profile for an analyst using dbt locally will have a target named dev, and have this set as the default.

You may also have a prod target within your profile, which creates the objects in your production schema. However, since it's often desirable to perform production runs on a schedule, we recommend deploying your dbt project on a separate machine to your local machine. As such, most dbt users will only have a dev target in their profile on their local machine.

If you do have multiple targets in your profile, and want to use a target other than the default, you can do this using the --target option when issuing a dbt command.

Understanding warehouse credentials

We recommend that each dbt user has their own set of database credentials, including a separate user for production runs of dbt – this helps debug rogue queries, simplifies ownerships of schemas, and improves security.

To ensure the user credentials you use in your target allow dbt to run, you will need to ensure the user has appropriate privileges. While the exact privileges needed varies between data warehouses, at a minimum your user must be able to:

  • read source data
  • create schemas¹
  • read system tables

Understanding target schemas

The target schema represents the default schema that dbt will build objects into, and is often used as the differentiator between separate environments within a warehouse.

The schema used for production should be named in a way that makes it clear that it is ready for end-users to use for analysis – we often name this analytics.

In development, a pattern we’ve found to work well is to name the schema in your dev target dbt_<username>. Suffixing your name to the schema enables multiple users to develop in dbt, since each user will have their own separate schema for development, so that users will not build over the top of each other, and ensuring that object ownership and permissions are consistent across an entire schema.

Note that there’s no need to create your target schema beforehand – dbt will check if the schema already exists when it runs, and create it if it doesn’t.

While the target schema represents the default schema that dbt will use, it may make sense to split your models into separate schemas, which can be done by using custom schemas.

Understanding threads

When dbt runs, it creates a directed acyclic graph (DAG) of links between models. The number of threads represents the maximum number of paths through the graph dbt may work on at once – increasing the number of threads can minimize the run time of your project.

For example, if you specify threads: 1, dbt will start building only one model, and finish it, before moving onto the next. Specifying threads: 8 means that dbt will work on up to 8 models at once without violating dependencies – the actual number of models it can work on will likely be constrained by the available paths through the dependency graph.

There's no set limit of the maximum number of threads you can set – while increasing the number of threads generally decreases execution time, there are a number of things to consider:

  • Increasing the number of threads increases the load on your warehouse, which may impact other tools in your data stack. For example, if your BI tool uses the same compute resources as dbt, their queries may get queued during a dbt run.
  • The number of concurrent queries your database will allow you to run may be a limiting factor in how many models can be actively built – some models may queue while waiting for an available query slot.

Generally the optimal number of threads depends on your data warehouse and its configuration. It’s best to test different values to find the best number of threads for your project. We recommend setting this to 4 to start with.

You can use a different number of threads than the value defined in your target by using the --threads option when executing a dbt command.

Advanced profile configuration

Using a custom profile directory

By default, dbt expects your profiles.yml file to be located in the ~/.dbt/ directory. To check the expected location of your profiles.yml file for your installation of dbt, you can run the following:

$ dbt debug --config-dir
To view your profiles.yml file, run:
open /Users/alice/.dbt

You may want to have your profiles.yml file stored in a different directory – for example, if you are using environment variables to load your credentials, you might choose to include this file in your version controlled dbt project, and direct dbt to load the file from there.

Note that the file always needs to be called profiles.yml, regardless of which directory it is in.

There are two ways to direct dbt to a different location for your profiles.yml file:

1. Use the --profiles-dir option when executing a dbt command This option can be used as follows:

$ dbt run --profiles-dir path/to/directory

If using this method, the --profiles-dir option needs to be provided every time you run a dbt command.

2. Use the DBT_PROFILES_DIR environment variable to change the default location Specifying this environment variable overrides the directory that dbt looks for your profiles.yml file in. You can specify this by running:

$ export DBT_PROFILES_DIR=path/to/directory

If the --profiles-dir option is used in a dbt command, it will take precedence over this environment variable.

Using environment variables in your profile

Credentials can either be placed directly into the profiles.yml file, or they can be loaded from environment variables. This is especially useful for production deployments of dbt. You can find more information about using environment variables here.

Additional profile configurations

Partial parsing

Partial parsing can improve the performance characteristics of dbt runs by limiting the number of files that are parsed by dbt. Here, "parsing" means reading files in a dbt project from disk and capturing ref() and config() method calls. dbt uses these method calls to determine 1) the shape of the dbt DAG and 2) the supplied configurations for dbt resources.

If partial parsing is enabled and files are unchanged between invocations of dbt, then dbt does not need to re-parse these files -- it can instead use the parsed representation from the last invocation of dbt. If a file has changed between invocations of dbt, then dbt will re-parse the file and update the parsed node cache accordingly.

Use caution when enabling partial parsing in dbt. If environment variables or variables specified on the CLI with --vars control the parsed representation of your project, then the logic executed by dbt may differ from the logic specified in your project. Partial parsing should only be used when all of the logic in your dbt project is encoded in the files inside of that project.

To enable partial parsing in your project, use the --partial-parse dbt flag, or specify partial_parse: true in your profiles.yml file:

partial_parse: True

This value can be overridden using the --partial-parse or --no-partial-parse flags.

Using colors in terminal output

By default, dbt will colorize the output it prints in your terminal. You can turn this off by adding the following to your profiles.yml file:

use_colors: False

Configuring printer width

By default, dbt will print out lines padded to 80 characters wide. You can change this setting by adding the following to your profiles.yml file:

printer_width: 120

Usage statistics

We want to build the best version of dbt possible, and a crucial part of that is understanding how users work with dbt. To this end, we've added some simple event tracking to dbt (using Snowplow). We do not track credentials, model contents or model names (we consider these private, and frankly none of our business).

Usage statistics are fired when dbt is invoked and when models are run. These events contain basic platform information (OS + python version). The schemas for these events can be seen here

By default this is turned on – you can opt out of event tracking at any time by adding the following to your profiles.yml file:

send_anonymous_usage_stats: False